Ethan Dyer

engineering & research at anthropic. i don't check twitter DMs. email me!

about 2 years ago

I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: https://t.co/Wi3bBNPewY

9

163

15

19

148K

Who to follow

andy jones

@andy_l_jones

Hanie Sedghi

@HanieSedghi

Staff Research Scientist at Google DeepMind. Interested in the science of deep learning. Current focus: Pushing boundaries of LLMs capabilities

something new 💼 Past: Co-leading AI Scientist effort @AnthropicAI (Discovery team), Gemini @GoogleDeepMind (Co-led Blueshift team) 🎒Traveling & Backpacking

ethansdyer retweeted

Oriol Vinyals

@OriolVinyalsML

about 2 years ago

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here https://t.co/GJXW8lduNk & read the full tech report here: https://t.co/Pltp92WcNo

OriolVinyalsML's tweet photo. Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra.

As a math undergrad, our drastic results in mathematics are particularly exciting to me!

In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵).

Gemini 1.5 is widely available, try it out for free here https://t.co/GJXW8lduNk & read the full tech report here: https://t.co/Pltp92WcNo

42

979

188

315

713K

ethansdyer retweeted

Joshua Batson @thebasepoint

over 2 years ago

In writing this paper, there were countless features we thought might be bugs. After careful inspection, ~all of them revealed surprising and subtle model properties. To me this capacity for surprise is the true test of a new technique. This thread is about my favorite finding.

4

370

39

149

105K

ethansdyer retweeted

nature

@Nature

over 2 years ago

Nature research paper: Universality in long-distance geometry and quantum complexity https://t.co/J5KlkugA58

0

18

7

4

18K

ethansdyer retweeted

over 3 years ago

Excited to announce that the entire Blueshift team has joined @DeepMind! We will be working with @OriolVinyalsML and others to advance capabilities of LLMs developed by DM / Alphabet! We hope to continue to grow DM's presence in Bay Area and New York in the coming months :-)

bneyshabur's tweet photo. Excited to announce that the entire Blueshift team has joined @DeepMind! We will be working with @OriolVinyalsML and others to advance capabilities of LLMs developed by DM / Alphabet! We hope to continue to grow DM's presence in Bay Area and New York in the coming months :-) https://t.co/u5ZkUC5eYV

31

1K

51

71

209K

ethansdyer retweeted

almost 4 years ago

If you are interested in solving challenging multi-step reasoning problems with LLMs, join us! We have an opening for a Research Scientist position at Blueshift! Learn more about the role & apply here: https://t.co/zDM9ooMLRN Learn about our team: https://t.co/eg6Obh2167

1

62

9

19

0

almost 4 years ago

@amirzait Great question! In https://t.co/RBS70Y20Ww we began to study memorization. We indeed looked at acc on modified questions, checked for MATH in the training data, and compared acc when removing answers similar to MATH. But this is an important direction for more follow up!

2

1

0

almost 4 years ago

1/ Super excited to introduce #Minerva 🦉(https://t.co/UI7zV0IXlS). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

ethansdyer's tweet photo. 1/ Super excited to introduce #Minerva 🦉(https://t.co/UI7zV0IXlS). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems. https://t.co/0up7y13crm

alewkowycz @alewkowycz

almost 4 years ago

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. https://t.co/bQJOyMSCD4

alewkowycz's tweet photo. Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning.
Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. https://t.co/bQJOyMSCD4 https://t.co/trN0y8hbAH

98

7K

1K

0

28

3K

512

548

0

almost 4 years ago

@HAKSOAT MMLU doesn't seem to have many pure E&M problems that require multiple steps. I agree it would be interesting to do a systematic evaluation. But here is one that I grabbed:

ethansdyer's tweet photo. @HAKSOAT MMLU doesn't seem to have many pure E&M problems that require multiple steps. I agree it would be interesting to do a systematic evaluation. But here is one that I grabbed: https://t.co/UFD3ovlav2

1

2

0

almost 4 years ago

@suzuki__r Yes, it is all done through reading TeX (or math ml, mathjax etc...). Very likely that the response will depend on the style.

1

0

almost 4 years ago

@KyleCranmer One fun aspect of how few shot prompting works with these generative models is we give: Question: ... Answer: ... ... Question: ... Answer: ... Question: And the model produces an answer. But then it keeps making up new questions and answers -- next year's pset 😉.

0

2

0

almost 4 years ago

@holmesjtg We don't have any concrete plans, but are definitely very interested in how this can be adapted to be a helpful tutor, answer questions as students ask them (rather than as tests phrase them) etc... Do you have any favorite datasets for this?

3

6

0

almost 4 years ago

@pablo_derbez Without additional prompting, it can still be quite brittle to such things. On the other hand, we have seen examples where the problem answer options assume some kind of rounding, Minerva solves exactly and then correctly realizes it is supposed to round.

1

4

0

almost 4 years ago

3/ Find out more about Minerva in the blog post (https://t.co/UI7zV0IXlS), paper (https://t.co/RBS70Y20Ww) or explore more minerva samples (https://t.co/zMcW595QpD)!

ethansdyer's tweet photo. 3/ Find out more about Minerva in the blog post (https://t.co/UI7zV0IXlS), paper (https://t.co/RBS70Y20Ww) or explore more minerva samples (https://t.co/zMcW595QpD)! https://t.co/aKp6prj3OF

3

75

13

10

0

almost 4 years ago

2/ Among many impressive properties, one side effect of training on the web is that Minerva has seen text used to draw mathematical figures and so can sometimes reason about diagrams.

ethansdyer's tweet photo. 2/ Among many impressive properties, one side effect of training on the web is that Minerva has seen text used to draw mathematical figures and so can sometimes reason about diagrams. https://t.co/LZmsMrYuxD

3

119

14

10

0

ethansdyer retweeted

Vedant Misra

@vedantmisra

almost 4 years ago

Thrilled to announce🦉Minerva: a large language model capable of solving mathematical problems using step-by-step reasoning in natural language. See blog here: https://t.co/eDtHy9oXci and samples here: https://t.co/GGECkO5Noo (1/n)

3

122

28

21

0

ethansdyer retweeted