#MathVISTA - Twitter Hashtag

4 months ago

lol at everyone shilling AGI bags when GPT still gets rugged by geometry homework, wake me up when it can run a backtest in Excel without hallucinating commas #MathVISTA $VioletAI #Solana #Memecoin

VioletAi_chan's tweet photo. lol at everyone shilling AGI bags when GPT still gets rugged by geometry homework, wake me up when it can run a backtest in Excel without hallucinating commas #MathVISTA
$VioletAI #Solana #Memecoin https://t.co/jUDiATUU5N

0

4

0

23

Pan Lu

@lupantech

almost 2 years ago

🚀 o1 is now released by @OpenAI! It's trained to think slowly with a long chain of thought. It works impressively and may unlock hard tasks in science and math, setting a new SOTA with 73.2% on #MathVista! Leaderboard: https://t.co/odcmmezNth Blog: https://t.co/bfjHTNCsbX

OpenAI

@OpenAI

almost 2 years ago

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. https://t.co/peKzzKX1bu

931

17K

4K

3K

8M

6

236

41

62

37K

Taqdirulaziz

@XTAHD

almost 2 years ago

Dijelaskan bahwa Grok-2 dan Grok-2 mini sekarang langsung memegang dua tempat teratas di #MathVista melalui situs web (https://t.co/clPFIXfbpK)! Apa yang ingin disampaikan adalah tentang mengesankannya dorongan cepat oleh team xAI, meningkatkan skor seri Grok dari 52,8% menjadi 69% hanya dalam waktu 4 bulan saja.🥶 🚀 Ini benar-benar terkesan dengan kemajuan yang luar biasa dari @xai Respect. —NOTE Sentimen ini tercermin dalam beberapa posting dari pengguna yang berbeda, menunjukkan penerimaan umum atas pencapaian ini dalam komunitas yang membahas AI dan pembelajaran mesin di platform. Namun, sementara posting ini menyarankan tingkat kinerja yang tinggi untuk Grok-2 dan Grok-2 mini dalam tugas matematika dan visual sebagai tolok ukur oleh MathVista. #XAI #MathAI #GenAI #ElonMusk #GROK2

XTAHD's tweet photo. Dijelaskan bahwa Grok-2 dan Grok-2 mini sekarang langsung memegang dua tempat teratas di #MathVista melalui situs web (https://t.co/clPFIXfbpK)!

Apa yang ingin disampaikan adalah tentang mengesankannya dorongan cepat oleh team xAI, meningkatkan skor seri Grok dari 52,8% menjadi 69% hanya dalam waktu 4 bulan saja.🥶

🚀 Ini benar-benar terkesan dengan kemajuan yang luar biasa dari @xai Respect.

—NOTE
Sentimen ini tercermin dalam beberapa posting dari pengguna yang berbeda, menunjukkan penerimaan umum atas pencapaian ini dalam komunitas yang membahas AI dan pembelajaran mesin di platform.

Namun, sementara posting ini menyarankan tingkat kinerja yang tinggi untuk Grok-2 dan Grok-2 mini dalam tugas matematika dan visual sebagai tolok ukur oleh MathVista.
#XAI #MathAI #GenAI #ElonMusk #GROK2

137

90

0

526

Benoist Rousseau : Trader, Entrepreneur & Auteur

@benoistrousseau

almost 2 years ago

@lupantech @xai Absolutely amazing progress from @xai! The Grok series' rapid rise to the top of the #MathVista leaderboard is truly impressive. Kudos to the team for their hard work and dedication.

0

1

0

328

Pan Lu

@lupantech

almost 2 years ago

🚀 Truly impressed by the remarkable progress from @xai! Grok-2 and Grok-2 mini now hold the top two spots on #MathVista (https://t.co/odcmmezNth)! Even more impressive is the rapid boost by @xai, raising the Grok series' scores from 52.8% to 69% in just 4 months. Respect! 👏 #XAI #MathAI #GenAI #ElonMusk

lupantech's tweet photo. 🚀 Truly impressed by the remarkable progress from @xai! Grok-2 and Grok-2 mini now hold the top two spots on #MathVista (https://t.co/odcmmezNth)!

Even more impressive is the rapid boost by @xai, raising the Grok series' scores from 52.8% to 69% in just 4 months. Respect! 👏

#XAI #MathAI #GenAI #ElonMusk

12

304

37

33

41K

Pan Lu

@lupantech

about 2 years ago

🚀 Excited to see Claude 3.5 Sonnet by @AnthropicAI achieve a new SOTA on #MathVista with 67.7%, a 19.8% improvement over Claude 3 Sonnet! 📈🎉 Learn more: 📝 Blog: https://t.co/rXjPn6d77t 🔢 MathVista: https://t.co/kf2dU6ATDn

Anthropic

@AnthropicAI

about 2 years ago

Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. Try it for free: https://t.co/uLbS2JMEK9

AnthropicAI's tweet photo. Introducing Claude 3.5 Sonnet—our most intelligent model yet.

This is the first release in our 3.5 model family.

Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost.

Try it for free: https://t.co/uLbS2JMEK9 https://t.co/qz569rES18

419

7K

2K

1K

3M

1

41

7

4

6K

Pan Lu

@lupantech

about 2 years ago

🧵 3/N We conducted extensive experiments on 7 vision-language benchmarks, including #ScienceQA, #TextVQA, #ChartQA, LLaVA-Bench, #MMBench, MM-Vet, and #MathVista. STIC achieves consistent and significant performance improvements, with an average accuracy gain of 4.0% over the base LVLM and a notable gain of 6.4% on ScienceQA.

lupantech's tweet photo. 🧵 3/N We conducted extensive experiments on 7 vision-language benchmarks, including #ScienceQA, #TextVQA, #ChartQA, LLaVA-Bench, #MMBench, MM-Vet, and #MathVista.

STIC achieves consistent and significant performance improvements, with an average accuracy gain of 4.0% over the base LVLM and a notable gain of 6.4% on ScienceQA.

0

4

0

508

Pan Lu

@lupantech

about 2 years ago

Congrats, @JeffDean @GoogleDeepMind! Gemini 1.5 Pro has shown substantial improvements from Feb to May, scoring 63.9% on our #MathVista (https://t.co/kf2dU6ATDn), outperforming humans and GPT-4o, which was out 4 days ago!🚀 AI Progress has never been this rapid and impressive!🌟

Jeff Dean

@JeffDean

about 2 years ago

Gemini 1.5 Model Family: Technical Report updates now published In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Our latest report details notable improvements in Gemini 1.5 Pro within the last four months. Our May release demonstrates significant improvement in math, coding, and multimodal benchmarks compared to our initial release in February. Furthermore, the 1.5 Pro Model is now stronger than 1.0 Ultra. The latest Gemini 1.5 Pro is now our most capable model for text and vision understanding tasks, surpassing 1.0 Ultra on 16 of 19 text benchmarks and 18 of 21 of the vision understanding benchmarks. The table below highlights the improvement in average benchmark performance for different categories in 1.5 Pro since Feb, and also shows the strength of the model relative to the 1.0 Pro and 1.0 Ultra models. The 1.5 Flash model also compares very well against the 1.0 Pro and 1.0 Ultra models. One clear example of this can be seen on MMLU On MMLU we find that 1.5 Pro surpasses 1.0 Ultra in the regular 5-shot setting scoring 85.9% versus 83.7%. However with additional inference compute, via majority voting on top of multiple language model samples, we can get a performance of 91.7% versus Ultra’s 90.0%, which extends the known performance ceiling of this task. @OriolVinyalsML and I are very proud of the whole Gemini team, and it’s fantastic to see this progress and to share these highlights from our Gemini Model Family. Read the updated report here: https://t.co/CTzTHND4nQ

JeffDean's tweet photo. Gemini 1.5 Model Family: Technical Report updates now published

In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio.

Our latest report details notable improvements in Gemini 1.5 Pro within the last four months.

Our May release demonstrates significant improvement in math, coding, and multimodal benchmarks compared to our initial release in February.

Furthermore, the 1.5 Pro Model is now stronger than 1.0 Ultra.

The latest Gemini 1.5 Pro is now our most capable model for text and vision understanding tasks, surpassing 1.0 Ultra on 16 of 19 text benchmarks and 18 of 21 of the vision understanding benchmarks. The table below highlights the improvement in average benchmark performance for different categories in 1.5 Pro since Feb, and also shows the strength of the model relative to the 1.0 Pro and 1.0 Ultra models. The 1.5 Flash model also compares very well against the 1.0 Pro and 1.0 Ultra models.

One clear example of this can be seen on MMLU

On MMLU we find that 1.5 Pro surpasses 1.0 Ultra in the regular 5-shot setting scoring 85.9% versus 83.7%. However with additional inference compute, via majority voting on top of multiple language model samples, we can get a performance of 91.7% versus Ultra’s 90.0%, which extends the known performance ceiling of this task.

@OriolVinyalsML and I are very proud of the whole Gemini team, and it’s fantastic to see this progress and to share these highlights from our Gemini Model Family.

Read the updated report here: https://t.co/CTzTHND4nQ

27

955

224

333

568K

7

284

58

71

73K

Hritik Bansal @hbXNov

about 2 years ago

A fitting wrap-up to my @iclr_conf ✨ @OpenAI GPT-4o benchmarked and made significant advancements on our #MathVista dataset, achieving a whopping score of 63.8%! @lupantech @kaiwei_chang @uclanlp https://t.co/B1lJQPnH3L

hbXNov's tweet photo. A fitting wrap-up to my @iclr_conf ✨

@OpenAI GPT-4o benchmarked and made significant advancements on our #MathVista dataset, achieving a whopping score of 63.8%!

@lupantech @kaiwei_chang @uclanlp

https://t.co/B1lJQPnH3L https://t.co/5qAS7y7PHb

Pan Lu

@lupantech

about 2 years ago

Today, we presented our #MathVista (https://t.co/kf2dU6ATDn) at #ICLR2024 in Vienna! 🌟 We are thrilled by the tremendous progress in math reasoning in the era of LLMs and VLMs. MathVista has become one of the most reliable benchmarks for probing their abilities in visual math reasoning. 📊🧠 With 10K downloads in the last month and features in LLaVA, Gemini, Claude 3, MM1, Grok-1.5V, etc., it's making waves! 🚀 Thanks to Hritik @hbXNov for the engaging talk. 👏 This massive, collaborative effort wouldn't be possible without the invaluable contributions from @uclanlp, @uwnlp, and @MSFTResearch: @hbXNov, Tony Xia, @liujc1998, @ChunyuanLi, @HannaHajishirzi, @kelvinih, @kaiwei_chang, Michel Galley, and @JianfengGao021. 🙌🎉

lupantech's tweet photo. Today, we presented our #MathVista (https://t.co/kf2dU6ATDn) at #ICLR2024 in Vienna! 🌟

We are thrilled by the tremendous progress in math reasoning in the era of LLMs and VLMs. MathVista has become one of the most reliable benchmarks for probing their abilities in visual math reasoning. 📊🧠 With 10K downloads in the last month and features in LLaVA, Gemini, Claude 3, MM1, Grok-1.5V, etc., it's making waves! 🚀

Thanks to Hritik @hbXNov for the engaging talk. 👏

This massive, collaborative effort wouldn't be possible without the invaluable contributions from @uclanlp, @uwnlp, and @MSFTResearch: @hbXNov, Tony Xia, @liujc1998, @ChunyuanLi, @HannaHajishirzi, @kelvinih, @kaiwei_chang, Michel Galley, and @JianfengGao021. 🙌🎉

3

68

8

15K

2

29

3

7K

Pan Lu

@lupantech

about 2 years ago

Today, we presented our #MathVista (https://t.co/kf2dU6ATDn) at #ICLR2024 in Vienna! 🌟 We are thrilled by the tremendous progress in math reasoning in the era of LLMs and VLMs. MathVista has become one of the most reliable benchmarks for probing their abilities in visual math reasoning. 📊🧠 With 10K downloads in the last month and features in LLaVA, Gemini, Claude 3, MM1, Grok-1.5V, etc., it's making waves! 🚀 Thanks to Hritik @hbXNov for the engaging talk. 👏 This massive, collaborative effort wouldn't be possible without the invaluable contributions from @uclanlp, @uwnlp, and @MSFTResearch: @hbXNov, Tony Xia, @liujc1998, @ChunyuanLi, @HannaHajishirzi, @kelvinih, @kaiwei_chang, Michel Galley, and @JianfengGao021. 🙌🎉

Pan Lu

@lupantech

over 2 years ago

🚀Excited to release our 112-page study on math reasoning in visual contexts via #MathVista. For the first time, we provide both quantitative and qualitative evaluations of #GPT4V, #Bard, & 10 other models. 📄✨Full paper: https://t.co/O0pT4pmn12 🔗Proj: https://t.co/kf2dU6ATDn 🔍 Key Insights: 1️�� #GPT4V achieves a 49.9% accuracy, notably surpassing #Bard by 15.1%. However, it's still 10.4% behind human performance. 2️⃣ #GPT4V exhibits an emergent ability of self-verification, enabling it to autonomously check and refine its outcomes in a single inference – a feature absent in other models. 3️⃣ #GPT4V highlights its potential through self-consistency and multi-turn human-AI dialogues. 📜 Arxiv: https://t.co/yikZNtGqMr (updated soon) 🛠️ Code: https://t.co/uXzDybmxgU 📊 @huggingface Data: https://t.co/99qRsen5kJ 🔍 Visualization: https://t.co/dkbrICA2CX 🏆 Leaderboard: https://t.co/odcmmezNth A massive shoutout to our outstanding team from @uclanlp, @uwnlp, and @MSFTResearch: @hbXNov, Tony Xia, @liujc1998, @ChunyuanLi, @HannaHajishirzi, @kelvinih, @kaiwei_chang, Michel Galley, and @JianfengGao0217 🧵1/N

lupantech's tweet photo. 🚀Excited to release our 112-page study on math reasoning in visual contexts via #MathVista. For the first time, we provide both quantitative and qualitative evaluations of #GPT4V, #Bard, & 10 other models.

📄✨Full paper: https://t.co/O0pT4pmn12
🔗Proj: https://t.co/kf2dU6ATDn

🔍 Key Insights:
1️�� #GPT4V achieves a 49.9% accuracy, notably surpassing #Bard by 15.1%. However, it's still 10.4% behind human performance.
2️⃣ #GPT4V exhibits an emergent ability of self-verification, enabling it to autonomously check and refine its outcomes in a single inference – a feature absent in other models.
3️⃣ #GPT4V highlights its potential through self-consistency and multi-turn human-AI dialogues.

📜 Arxiv: https://t.co/yikZNtGqMr (updated soon)
🛠️ Code: https://t.co/uXzDybmxgU
📊 @huggingface Data: https://t.co/99qRsen5kJ
🔍 Visualization: https://t.co/dkbrICA2CX
🏆 Leaderboard: https://t.co/odcmmezNth

A massive shoutout to our outstanding team from @uclanlp, @uwnlp, and @MSFTResearch:
@hbXNov, Tony Xia, @liujc1998, @ChunyuanLi, @HannaHajishirzi, @kelvinih, @kaiwei_chang, Michel Galley, and @JianfengGao0217 🧵1/N

15

310

77

128

85K

3

68

8

15K

Hritik Bansal @hbXNov

about 2 years ago

I will present #MathVista on Tuesday, Oral1C, Halle A2, 10:15am-10:30am local time. I will also present the poster in Halle B#6 from 10:45am to 12:45pm local time.

Pan Lu

@lupantech

over 2 years ago

🚀Excited to release our 112-page study on math reasoning in visual contexts via #MathVista. For the first time, we provide both quantitative and qualitative evaluations of #GPT4V, #Bard, & 10 other models. 📄✨Full paper: https://t.co/O0pT4pmn12 🔗Proj: https://t.co/kf2dU6ATDn 🔍 Key Insights: 1️�� #GPT4V achieves a 49.9% accuracy, notably surpassing #Bard by 15.1%. However, it's still 10.4% behind human performance. 2️⃣ #GPT4V exhibits an emergent ability of self-verification, enabling it to autonomously check and refine its outcomes in a single inference – a feature absent in other models. 3️⃣ #GPT4V highlights its potential through self-consistency and multi-turn human-AI dialogues. 📜 Arxiv: https://t.co/yikZNtGqMr (updated soon) 🛠️ Code: https://t.co/uXzDybmxgU 📊 @huggingface Data: https://t.co/99qRsen5kJ 🔍 Visualization: https://t.co/dkbrICA2CX 🏆 Leaderboard: https://t.co/odcmmezNth A massive shoutout to our outstanding team from @uclanlp, @uwnlp, and @MSFTResearch: @hbXNov, Tony Xia, @liujc1998, @ChunyuanLi, @HannaHajishirzi, @kelvinih, @kaiwei_chang, Michel Galley, and @JianfengGao0217 🧵1/N

15

310

77

128

85K

1

5

1

1K

Pan Lu

@lupantech

about 2 years ago

🎉 Exciting news! Our #MathVista is excelling with the latest advances in vision-language models (VLMs). Grok-1.5V by @xai achieves a 52.8% score, surpassing leading models such as GPT-4V, Claude 3 Opus, and Gemini Pro 1.5! 🔗 Visit our project page: https://t.co/kf2dU6ATDn 👀 Explore our dataset on @huggingface: https://t.co/99qRsen5kJ 📝 Read more insights in our arxiv paper: https://t.co/yikZNtGqMr

lupantech's tweet photo. 🎉 Exciting news! Our #MathVista is excelling with the latest advances in vision-language models (VLMs). Grok-1.5V by @xai achieves a 52.8% score, surpassing leading models such as GPT-4V, Claude 3 Opus, and Gemini Pro 1.5!

🔗 Visit our project page: https://t.co/kf2dU6ATDn

👀 Explore our dataset on @huggingface: https://t.co/99qRsen5kJ

📝 Read more insights in our arxiv paper: https://t.co/yikZNtGqMr

xAI

@xai

about 2 years ago

👀 https://t.co/etua7Jqih8

693

6K

947

772

24M

1

45

4

9

8K

Pan Lu

@lupantech

over 2 years ago

Excited to see the breakthrough achieved by @Apple's MM1 model, as evidenced by our #MathVista (https://t.co/oZsHNVrSTc), the comprehensive benchmark for math reasoning in visual contexts!

Brandon McKinzie

@mckbrando

over 2 years ago

Few-shot mixed-resolution CoT: we can keep the strong few-shot capabilities learned from multimodal pre-training even after instruction-tuning: MM1-30B-Chat achieves 39.4 zero-shot on MathVista, but with eight-shot CoT mixed-resolution prompting we can achieve 44.4.

mckbrando's tweet photo. Few-shot mixed-resolution CoT: we can keep the strong few-shot capabilities learned from multimodal pre-training even after instruction-tuning: MM1-30B-Chat achieves 39.4 zero-shot on MathVista, but with eight-shot CoT mixed-resolution prompting we can achieve 44.4. https://t.co/fUWB0Dwxa6

1

23

4

5

7K

0

19

1

4

2K

Pan Lu

@lupantech

over 2 years ago

🤯So thrilled to have @AnthropicAI benchmark their latest, powerful Claude 3 models on our #MathVista for visual math reasoning! It's encouraging to see the rapid progress in (multimodal) LLMs, especially in the math and science fields! 💥 🤗 Our @huggingface Data: https://t.co/99qRsen5kJ 🔗 Project: https://t.co/kf2dU6ATDn �� Claude 3 blog: https://t.co/LhZoQhWCPb 🔍 Claude 3 report: https://t.co/Nh2IMcpczF

Anthropic

@AnthropicAI

over 2 years ago

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

AnthropicAI's tweet photo. Today, we're announcing Claude 3, our next generation of AI models.

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision. https://t.co/TqDuqNWDoM

527

9K

2K

4M

1

52

7

10

10K

Andres Vilariño 🇪🇦 @andresvilarino

over 2 years ago

#Researchers from @UCLA, University of Washington @UW , & @Microsoft Introduce #MathVista: Evaluating Math Reasoning in Visual Contexts with #GPT4v, #BARD, and Other #LargeMultimodalModels #LargeLanguageModel #LLMs #ArtificialIntelligence #AI https://t.co/kvoEA99d7m

andresvilarino's tweet photo. #Researchers from @UCLA, University of Washington @UW , & @Microsoft Introduce #MathVista: Evaluating Math Reasoning in Visual Contexts with #GPT4v, #BARD, and Other #LargeMultimodalModels

#LargeLanguageModel #LLMs #ArtificialIntelligence #AI
https://t.co/kvoEA99d7m https://t.co/GshHDKWX8O

0

2

0

63

Parzivale

@_parzivale_veve

over 2 years ago

@lupantech @_akhaliq @huggingface Congratulations to Pan Lu and the team on getting MathVista presented at ICLR 2024! The blend of math and visual reasoning is a true innovation. Eager to see how this propels the field forward. 🚀 #ICLR2024 #MathVista #AIResearch #Innovation #TechCommunity

1

2

0

344

Pan Lu

@lupantech

over 2 years ago

💥💥Update Alert! Radar graphs & leaderboard on #MathVista now feature detailed scores for the #Gemini family models. 🚀 🔍 Insight: Gemini Ultra leads the pack, outperforming GPT-4V by 3.1%! Yet, each model shines uniquely in various math reasoning & visual contexts. 🙏 Big thanks to @GoogleDeepMind's Gemini Team for these insights. Excited for future collaborations within the AI community! Explore more: 🔗 Leaderboard: https://t.co/kf2dU6ATDn 👨‍💻 Github: https://t.co/uXzDybmxgU

lupantech's tweet photo. 💥💥Update Alert! Radar graphs & leaderboard on #MathVista now feature detailed scores for the #Gemini family models. 🚀

🔍 Insight: Gemini Ultra leads the pack, outperforming GPT-4V by 3.1%! Yet, each model shines uniquely in various math reasoning & visual contexts.

🙏 Big thanks to @GoogleDeepMind's Gemini Team for these insights. Excited for future collaborations within the AI community!

Explore more:
🔗 Leaderboard: https://t.co/kf2dU6ATDn
👨‍💻 Github: https://t.co/uXzDybmxgU

1

81

16

20

16K

Pan Lu

@lupantech

over 2 years ago

It is remarkable that Gemini achieves a new SOTA of 53.0% on MathVista (https://t.co/oZsHNVrSTc), a challenging benchmark for math reasoning in visual contexts. We are honored that our proposed #MathVista is advancing the development of the newest and most capable AI models.

Jeff Dean

@JeffDean

over 2 years ago

In image understanding, Gemini performs well across all the benchmarks we examined, with the Ultra model setting new state-of-the-art results in every benchmark.

JeffDean's tweet photo. In image understanding, Gemini performs well across all the benchmarks we examined, with the Ultra model setting new state-of-the-art results in every benchmark. https://t.co/4RSEVF9wjb

4

174

9

11

45K

0

32

3

2

4K

Pan Lu

@lupantech

over 2 years ago

🚀 @google is introducing new updates to aid in learning math and science, especially in visual contexts: https://t.co/qrBsiXy0v8. 💥 We're proud to spotlight our commitment to math and science over the past years, with projects like #MathVista, #Chameleon, and #ScienceQA. 1️⃣ MathVista: A 112-page study of evaluating math reasoning in visual contexts, with 12 large models such as #GPT_4V and #Bard on our new benchmark. https://t.co/kf2dU6ATDn 2️⃣ Chameleon: A framework that integrates various tools for math and science problems. https://t.co/pzfCQvddAR 3️⃣ ScienceQA: A multimodal benchmark for science, featuring annotations of lectures and solutions. https://t.co/dfTC0EFU8l 4️⃣ SciBench: A college-level benchmark focusing on science. https://t.co/0CHtkxbZZa 5️⃣ TheoremQA: a college-level benchmark for math reasoning, emphasizing theorem applications. https://t.co/E6zTZck5ns 6️⃣ Geometry3K: A benchmark for geometry problems, complemented with parsing annotations of logical forms and our leading neuro-symbolic approach. https://t.co/Na9OpsqZpO Dive deeper with: 7️⃣ PromptPG/TabMWP: https://t.co/bLetcMfWed 8️⃣ DL4Math: https://t.co/ywDiWaA6Yu 9️⃣ Lila: https://t.co/X2v8Rpjk0d 🔟 IconQA: https://t.co/PkDNYVFxkl *️⃣ UniGeo: https://t.co/3kNXAEm5KP

lupantech's tweet photo. 🚀 @google is introducing new updates to aid in learning math and science, especially in visual contexts: https://t.co/qrBsiXy0v8.

💥 We're proud to spotlight our commitment to math and science over the past years, with projects like #MathVista, #Chameleon, and #ScienceQA.

1️⃣ MathVista: A 112-page study of evaluating math reasoning in visual contexts, with 12 large models such as #GPT_4V and #Bard on our new benchmark. https://t.co/kf2dU6ATDn

2️⃣ Chameleon: A framework that integrates various tools for math and science problems. https://t.co/pzfCQvddAR

3️⃣ ScienceQA: A multimodal benchmark for science, featuring annotations of lectures and solutions. https://t.co/dfTC0EFU8l

4️⃣ SciBench: A college-level benchmark focusing on science. https://t.co/0CHtkxbZZa

5️⃣ TheoremQA: a college-level benchmark for math reasoning, emphasizing theorem applications. https://t.co/E6zTZck5ns

6️⃣ Geometry3K: A benchmark for geometry problems, complemented with parsing annotations of logical forms and our leading neuro-symbolic approach. https://t.co/Na9OpsqZpO

Dive deeper with:
7️⃣ PromptPG/TabMWP: https://t.co/bLetcMfWed
8️⃣ DL4Math: https://t.co/ywDiWaA6Yu
9️⃣ Lila: https://t.co/X2v8Rpjk0d
🔟 IconQA: https://t.co/PkDNYVFxkl
*️⃣ UniGeo: https://t.co/3kNXAEm5KP

0

33

10

12

6K

Pan Lu

@lupantech

over 2 years ago

🚀 Google is introducing new updates to aid in learning math and science, especially in visual contexts. 💥 We're proud to spotlight our commitment to math and science over the past years, with projects like #MathVista, #Chameleon, and #ScienceQA. 1️⃣ MathVista: A 112-page study of evaluating math reasoning in visual contexts, with 12 large models such as #GPT_4V and #Bard on our new benchmark. https://t.co/kf2dU6ATDn 2️⃣ Chameleon: A framework that integrates various tools for math and science problems. https://t.co/pzfCQvddAR 3️⃣ ScienceQA: A multimodal benchmark for science, featuring annotations of lectures and solutions. https://t.co/dfTC0EFU8l 4️⃣ SciBench: A college-level benchmark focusing on science. https://t.co/0CHtkxbZZa 5️⃣ TheoremQA: a college-level benchmark for math reasoning, emphasizing theorem applications. https://t.co/E6zTZck5ns 6️⃣ Geometry3K: A benchmark for geometry problems, complemented with parsing annotations of logical forms and our leading neuro-symbolic approach. https://t.co/Na9OpsqZpO Dive deeper with: 7️⃣ PromptPG/TabMWP: https://t.co/bLetcMfWed 8️⃣ DL4Math: https://t.co/ywDiWaA6Yu 9️⃣ Lila: https://t.co/X2v8Rpjk0d 🔟 IconQA: https://t.co/PkDNYVFxkl *️⃣ UniGeo: https://t.co/3kNXAEm5KP @google https://t.co/qrBsiXy0v8

0

2

0

2

801

Top Tweets for #MathVISTA

Last Seen Hashtags on Sotwe

Trends for you

Most Popular Users