Daniel Weld @dsweld - Twitter Profile

about 1 month ago

This benchmark for Ai scientific capabilities is beautifully thought out - I especially like it's clear enumeration of design principles...

Ai2 @allen_ai

about 1 month ago

New AstaBench results show frontier models making progress on scientific research, but the benchmark remains far from solved. Claude Opus 4.7 leads overall at 58.0%, while GPT-5.5 comes within 5.1 points at less than half the measured cost per problem. 🧵

allen_ai's tweet photo. New AstaBench results show frontier models making progress on scientific research, but the benchmark remains far from solved.

Claude Opus 4.7 leads overall at 58.0%, while GPT-5.5 comes within 5.1 points at less than half the measured cost per problem. 🧵 https://t.co/90njufZK0z

2

73

11

34

7K

0

7

1

4

810

dsweld retweeted

Zixian Ma@CVPR

@zixianma02

2 months ago

We built MolmoWeb from the scratch with Molmo2!!! 💕🌐 It’s not easy to build SOTA web agents out of open source VLMs, when they can be so profitable that very few projects release everything (if anything), esp the datasets 🔑 But, we just released all the MolmoWeb model checkpoints and datasets from ai2😉 Can’t wait to see what the community builds on top of MolmoWeb!🫡

10

218

25

68

27K

dsweld retweeted

Ai2 @allen_ai

3 months ago

🔎 Deep research agents like Asta ScholarQA and OpenAI Deep Research are transforming how we perform literature review. But how do we know if the way we evaluate them is actually meaningful? Announcing our new paper: “Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks” 🧵

5

156

20

92

12K

dsweld retweeted

Pao Siangliulue @Siangliulue

3 months ago

Are you a researcher in CS or a CS-adjacent field curious about how an AI agent can help you with your research project? Want to try a new tool for your research support in a paid user study ($100, 2 hr)? Limited spot numbers. See details and sign up here: https://t.co/lAhe3zNUK1

2

98

22

103

10K

Who to follow

Yejin Choi

@YejinChoinka

professor at Stanford, researcher at NVIDIA, adventurer at heart

Maarten Sap (he/him)

@MaartenSap

retiring X acct: find me @maartensap.bsky Working on #NLProc for social good. Currently at @LTIatCMU, previously at @UWNLP, @MSFTResearch, and @allen_ai. 🏳‍🌈

Sherry Tongshuang Wu

@tongshuangwu

Assist. Prof @SCSatCMU , CS PhD @uwcse. HCI+AI, map general-purpose models to specific use cases! prev. intern @MSFTResearch @GoogleAI @Apple. She/her.

dsweld retweeted

Ai2 @allen_ai

3 months ago

Can AI predict what scientists will do next—not just one piece, but the whole research process? PreScience is our new model eval for forecasting how science unfolds end-to-end, from how research teams form to a paper's eventual impact. Built with @UChicago, supported by @NSF.

allen_ai's tweet photo. Can AI predict what scientists will do next—not just one piece, but the whole research process? PreScience is our new model eval for forecasting how science unfolds end-to-end, from how research teams form to a paper's eventual impact. Built with @UChicago, supported by @NSF. https://t.co/iU5WzT4w0U

4

104

15

47

15K

Daniel Weld @dsweld

4 months ago

Truly open scientific question answering - that's good! https://t.co/cG3i5HLvKP

0

1

0

191

dsweld retweeted

Ai2 @allen_ai

4 months ago

We’re releasing the Theorizer code and framework + a dataset of ~3,000 theories generated by Theorizer across the field of AI/NLP, built from 13,744 source papers. 💻 Code: https://t.co/C5zr2Nm9c7 📝 Technical report: https://t.co/3LUiDkXyvc ✍️ Learn more in our blog: https://t.co/OkCG3LCqtE

2

73

16

62

4K

Daniel Weld @dsweld

4 months ago

I'm so excited by this! Our system is generating some insightful & novel theories (e.g., internally for LM post-training). And it's still getting better!

Ai2 @allen_ai

4 months ago

Introducing Theorizer: Turning thousands of papers into scientific laws 📚➡️📜 Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory building—compressing scattered findings into structured, testable claims. 🧵

allen_ai's tweet photo. Introducing Theorizer: Turning thousands of papers into scientific laws 📚➡️📜

Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory building—compressing scattered findings into structured, testable claims. 🧵 https://t.co/nbWlbc9MCk

14

588

89

444

56K

0

26

2

10

6K

dsweld retweeted

Ai2 @allen_ai

4 months ago

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵

allen_ai's tweet photo. Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵 https://t.co/dor94O62B9

42

931

137

696

351K

Daniel Weld @dsweld

5 months ago

Smart analysis analysis of scholar output when authors adopted LLMs as part of their writing: 1) huge 36% boost in # papers published 2) LLMs mitigate skill disparities, eg native language - enough to shift market share of production toward China https://t.co/GPaak6dguv @yian_yin

0

9

1

3

634

dsweld retweeted

Ai2 @allen_ai

6 months ago

🆕 New in Asta: multi-turn report generation. You can now have back-and-forth conversations with Asta, our agentic platform for scientific research, to refine long-form, fully cited reports instead of relying on single-shot prompts.

allen_ai's tweet photo. 🆕 New in Asta: multi-turn report generation.
You can now have back-and-forth conversations with Asta, our agentic platform for scientific research, to refine long-form, fully cited reports instead of relying on single-shot prompts. https://t.co/ah5JsKxHGW

1

71

10

24

7K

dsweld retweeted

Ai2 @allen_ai

6 months ago

🧠 Introducing NeuroDiscoveryBench. Built with @AllenInstitute, it’s the first benchmark for evaluating AI systems like our Asta DataVoyager agent on neuroscience data. The benchmark tests whether AI can truly extract insights from complex brain datasets.

allen_ai's tweet photo. 🧠 Introducing NeuroDiscoveryBench. Built with @AllenInstitute, it’s the first benchmark for evaluating AI systems like our Asta DataVoyager agent on neuroscience data. The benchmark tests whether AI can truly extract insights from complex brain datasets. https://t.co/nPeOjO5F2u

4

107

21

26

10K

dsweld retweeted

Bodhisattwa Majumder

@mbodhisattwa

6 months ago

#NeurIPS2025 and AI x Science? Some fun announcements are coming up. Stay tuned. Also, our Asta internship application is still open -- apply and mention my name if you'd like to work w me ~

1

33

2

25

7K

dsweld retweeted

Ai2 @allen_ai

7 months ago

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet photo. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵 https://t.co/vnGrArA44X

54

2K

326

693

610K

dsweld retweeted

Bodhisattwa Majumder

@mbodhisattwa

7 months ago

Plenty of AI-gen papers in ICLR. Wonder why? 🚨 In a preregistered Randomized Controlled Trial, we find: CS authors perceive AI-abstracts as more readable, tend to edit less than their published counterparts. AI-use and its disclosure shape the fabric of collaborative scientific writing. Work led by @hsanchaita & @leadoeun27, advised by @shocheen & yours truly. 1/n

mbodhisattwa's tweet photo. Plenty of AI-gen papers in ICLR. Wonder why?

🚨 In a preregistered Randomized Controlled Trial, we find: CS authors perceive AI-abstracts as more readable, tend to edit less than their published counterparts. AI-use and its disclosure shape the fabric of collaborative scientific writing.

Work led by @hsanchaita & @leadoeun27, advised by @shocheen & yours truly.
1/n

2

64

15

32

15K

Daniel Weld @dsweld

7 months ago

Impressive deep-research performance by a tiny & open model!

Rulin Shao @RulinShao

7 months ago

🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model - are grounded on search knowledge 🧵

RulinShao's tweet photo. 🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀

The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics:
- co-evolve with the policy model
- are grounded on search knowledge
🧵

8

558

117

309

134K

1

4

0

468

Daniel Weld @dsweld

7 months ago

The benchmark desiderata alone make this paper worth a read...

Jonathan Bragg @turingmusician

7 months ago

Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕https://t.co/BFjdGCAp1w 🧵👇

4

30

21

9

4K

0

2

0

1

669

Daniel Weld @dsweld

7 months ago

Super interesting and well written summary of the incredible progress we’ve made on climate change (and what’s most important to do next) ⭐️⭐️⭐️⭐️⭐️ https://t.co/hSensrfSdY

0

2

0

445

dsweld retweeted

Ai2 @allen_ai

8 months ago

📊 Today we're releasing data showing which scientific papers our AI research tool Asta cites most frequently. Think of it as creating citation counts for the AI era—tracking which research is actually powering AI answers across thousands of queries. 🧵

1

41

5

11

11K

Daniel Weld @dsweld

8 months ago

Pretty amazing that this can be done at all, but especially with federated data (crucial given the sensitivity of patient data)!

Ai2 @allen_ai

8 months ago

Introducing Asta DataVoyager—our new AI capability in Asta that turns structured data into transparent, reproducible insights. Built for scientists, grounded in open, inspectable workflows. 🧵

5

114

27

51

372K

0

8

2

2K

Daniel Weld

@dsweld

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users