Yinkai Wang @YinkaiW - Twitter Profile

7 months ago

@anthonygitter Re the typo: (1) we random sample from all possible mutation sites for the 4-site targets, (2) we treat those not tested in the original dataset as invalid. We will revise them in our next version. 2/

1

0

26

Yinkai Wang @YinkaiW

7 months ago

@anthonygitter Thanks for the constructive feedback! We'd be happy to talk more about more realistic evaluation targets, we agree the targets and ML oracles are not ideal! 1/

1

0

34

Yinkai Wang @YinkaiW

7 months ago

Benchmarks shouldn’t be school exams. SDE tests whether LLMs can do science: hypothesize, experiment, analyze, iterate. 8 projects, 43 scenarios, 1125 questions, built with 50+ scientists. 🧪⚛️

Chenru Duan

@chenru_duan

7 months ago

Is LLM ready for real scientific discovery? To find out, we gathered 50+ scientists from 20+ institutions establishing a multi-level evaluation framework: Not only on questions, but also on research scenarios and projects Current science benchmarks (like GPQA and MMMU) ask AI to answer quizzes. But science isn't a quiz. It’s an iterative loop of hypothesis, experiment, and analysis. Mastery of static, decontextualized questions, even if perfect, does not guarantee readiness to discovery, just as earning straight A’s in coursework does not indicate a great researcher. Today, we introduce Scientific Discovery Evaluation (SDE): A benchmark grounded in real-world research projects. There, research projects are decomposed into modular research scenarios from which vetted questions are sampled. LLMs are evaluated on 1. Question-level: targeted, expert-written problems embedded in real research scenarios (elucidating structure from NMR, forward reaction prediction, etc.), NOT sub-domains (analytical chemistry, inorganic materials, etc.) 2. Project-level: realistic scientific discovery loops (e.g., molecular design, materials discovery, protein engineering) where models must iteratively propose, test, and refine hypotheses. With a joint force of 50+ scientists from 20+ institutes, we gathered 8 projects, 43 research scenarios, and 1125 questions. Evaluation on these multiple levels reveals where current models succeed, where they fail, and why. It is of great joy to work with a 50+ author team in my first time of life - Thanks to you all for making it happen. @hello_jocelynlu, @YuanqiD, @BotaoYu24, @HowieH36226, @rogerluorl18, @YuanhaoQ, @YinkaiW, @Haorui_Wang123, @JeffGuo__, @SherryLixueC, @MengdiWang10, @lecong, @ParshinShojaee @KexinHuang5 @chandankreddy, @realadityanandy, @pschwllr, @KulikGroup, @hhsun1, @MoosaviSMohamad, and many others who are not in the x-universe. Also it’s exciting to see a concurrent release from @OpenAI on FrontierScience yesterday (@MilesKWang)! Their findings on the need for harder, expert-vetted evals, especially the huge performance gap between Olympiad and research questions, echo ours. SDE takes this a step further by moving beyond expert-level Q&A to explicitly evaluate the end-to-end discovery loop with project-level execution, where more finer-grained observations are thereby made possible. Core Findings Below:

chenru_duan's tweet photo. Is LLM ready for real scientific discovery? To find out, we gathered 50+ scientists from 20+ institutions establishing a multi-level evaluation framework: Not only on questions, but also on research scenarios and projects

Current science benchmarks (like GPQA and MMMU) ask AI to answer quizzes. But science isn't a quiz. It’s an iterative loop of hypothesis, experiment, and analysis. Mastery of static, decontextualized questions, even if perfect, does not guarantee readiness to discovery, just as earning straight A’s in coursework does not indicate a great researcher.

Today, we introduce Scientific Discovery Evaluation (SDE): A benchmark grounded in real-world research projects. There, research projects are decomposed into modular research scenarios from which vetted questions are sampled. LLMs are evaluated on

1. Question-level: targeted, expert-written problems embedded in real research scenarios (elucidating structure from NMR, forward reaction prediction, etc.), NOT sub-domains (analytical chemistry, inorganic materials, etc.)
2. Project-level: realistic scientific discovery loops (e.g., molecular design, materials discovery, protein engineering) where models must iteratively propose, test, and refine hypotheses.
With a joint force of 50+ scientists from 20+ institutes, we gathered 8 projects, 43 research scenarios, and 1125 questions. Evaluation on these multiple levels reveals where current models succeed, where they fail, and why.

It is of great joy to work with a 50+ author team in my first time of life - Thanks to you all for making it happen. @hello_jocelynlu, @YuanqiD, @BotaoYu24, @HowieH36226, @rogerluorl18, @YuanhaoQ, @YinkaiW, @Haorui_Wang123, @JeffGuo__, @SherryLixueC, @MengdiWang10, @lecong, @ParshinShojaee @KexinHuang5 @chandankreddy, @realadityanandy, @pschwllr, @KulikGroup, @hhsun1, @MoosaviSMohamad, and many others who are not in the x-universe.

Also it’s exciting to see a concurrent release from @OpenAI on FrontierScience yesterday (@MilesKWang)! Their findings on the need for harder, expert-vetted evals, especially the huge performance gap between Olympiad and research questions, echo ours. SDE takes this a step further by moving beyond expert-level Q&A to explicitly evaluate the end-to-end discovery loop with project-level execution, where more finer-grained observations are thereby made possible.

Core Findings Below:

8

142

34

69

133K

0

42

YinkaiW retweeted

Chenru Duan

@chenru_duan

7 months ago

Is LLM ready for real scientific discovery? To find out, we gathered 50+ scientists from 20+ institutions establishing a multi-level evaluation framework: Not only on questions, but also on research scenarios and projects Current science benchmarks (like GPQA and MMMU) ask AI to answer quizzes. But science isn't a quiz. It’s an iterative loop of hypothesis, experiment, and analysis. Mastery of static, decontextualized questions, even if perfect, does not guarantee readiness to discovery, just as earning straight A’s in coursework does not indicate a great researcher. Today, we introduce Scientific Discovery Evaluation (SDE): A benchmark grounded in real-world research projects. There, research projects are decomposed into modular research scenarios from which vetted questions are sampled. LLMs are evaluated on 1. Question-level: targeted, expert-written problems embedded in real research scenarios (elucidating structure from NMR, forward reaction prediction, etc.), NOT sub-domains (analytical chemistry, inorganic materials, etc.) 2. Project-level: realistic scientific discovery loops (e.g., molecular design, materials discovery, protein engineering) where models must iteratively propose, test, and refine hypotheses. With a joint force of 50+ scientists from 20+ institutes, we gathered 8 projects, 43 research scenarios, and 1125 questions. Evaluation on these multiple levels reveals where current models succeed, where they fail, and why. It is of great joy to work with a 50+ author team in my first time of life - Thanks to you all for making it happen. @hello_jocelynlu, @YuanqiD, @BotaoYu24, @HowieH36226, @rogerluorl18, @YuanhaoQ, @YinkaiW, @Haorui_Wang123, @JeffGuo__, @SherryLixueC, @MengdiWang10, @lecong, @ParshinShojaee @KexinHuang5 @chandankreddy, @realadityanandy, @pschwllr, @KulikGroup, @hhsun1, @MoosaviSMohamad, and many others who are not in the x-universe. Also it’s exciting to see a concurrent release from @OpenAI on FrontierScience yesterday (@MilesKWang)! Their findings on the need for harder, expert-vetted evals, especially the huge performance gap between Olympiad and research questions, echo ours. SDE takes this a step further by moving beyond expert-level Q&A to explicitly evaluate the end-to-end discovery loop with project-level execution, where more finer-grained observations are thereby made possible. Core Findings Below:

8

142

34

69

133K

Who to follow

Yiqiao Jin @GeorgiaTech

@AhrenJin

CS PhD Candidate @GeorgiaTech @GTCSE @gtcomputing. Intern @jpmorgan @AdobeResearch @MSFTResearch @Amazon @IBM. Previous @UCLA CS Ugrad.

Yingheng Wang

@yingheng_wang

CS PhD @Cornell @CUAISci | Intern @amazon @awscloud @MSFTResearch @NECLabsAmerica | Prev @JohnsHopkins, @Tsinghua_Uni | AI/ML & Science

YinkaiW retweeted

7 months ago

Molmo2 is here! Have spent the whole year working on the data part and else. It’s a great opportunity to apply what I’ve learned during my past exploration of data-centric AI and learned a lot more about video models.

0

57

10

5

13K

YinkaiW retweeted

Biology+AI Daily @BiologyAIDaily

over 1 year ago

Large Language Model is Secretly a Protein Sequence Optimizer 1/ This paper demonstrates that large language models (LLMs), originally trained on massive text datasets, can be effectively used as protein sequence optimizers. By integrating them into a directed evolutionary framework, LLMs optimize protein fitness without further fine-tuning, outperforming traditional random mutation methods. 2/ The authors propose an evolutionary method where LLMs generate new protein sequence candidates by performing mutations or crossovers on parent sequences, optimizing for high fitness while keeping edits minimal, even with constrained budgets. 3/ The study tests the LLM-based method across multiple optimization tasks, including single-objective, constrained, and multi-objective optimization, using both synthetic and experimental fitness landscapes. Results show that LLMs outperform traditional evolutionary algorithms (EA) on several benchmarks, especially in non-linear fitness landscapes. 4/ A significant innovation of this work is using LLMs directly to propose sequences for evolutionary processes, making the optimization of protein sequences faster and more efficient, especially when budget constraints or fitness landscapes are complex. 5/ The method is applied to several datasets, including Green Fluorescent Protein (GFP) and Adeno-Associated Virus (AAV) sequences, where the LLM-guided approach consistently yields higher fitness scores than baseline evolutionary algorithms. 6/ The researchers highlight the flexibility of LLMs in optimizing protein sequences, not only for single-objective tasks but also for more complex, multi-objective optimization, demonstrating their potential for high-throughput experiments and real-world applications in protein engineering. 7/ The work offers a promising alternative to conventional directed evolution, which often faces challenges such as slow progress and suboptimal convergence, by harnessing the power of LLMs to efficiently explore vast sequence spaces and identify high-performance variants. @sohahassoun @xiaolinxu1 @lipingliulp @CanalJiananLi @XiaohuiC16528 @YuanqiD @jacksonleihao @YinkaiW 📜Paper: https://t.co/SSllydThWo #ProteinEngineering #DirectedEvolution #MachineLearning #LLM #Bioinformatics #SyntheticBiology #ProteinOptimization

BiologyAIDaily's tweet photo. Large Language Model is Secretly a Protein Sequence Optimizer

1/ This paper demonstrates that large language models (LLMs), originally trained on massive text datasets, can be effectively used as protein sequence optimizers. By integrating them into a directed evolutionary framework, LLMs optimize protein fitness without further fine-tuning, outperforming traditional random mutation methods.

2/ The authors propose an evolutionary method where LLMs generate new protein sequence candidates by performing mutations or crossovers on parent sequences, optimizing for high fitness while keeping edits minimal, even with constrained budgets.

3/ The study tests the LLM-based method across multiple optimization tasks, including single-objective, constrained, and multi-objective optimization, using both synthetic and experimental fitness landscapes. Results show that LLMs outperform traditional evolutionary algorithms (EA) on several benchmarks, especially in non-linear fitness landscapes.

4/ A significant innovation of this work is using LLMs directly to propose sequences for evolutionary processes, making the optimization of protein sequences faster and more efficient, especially when budget constraints or fitness landscapes are complex.

5/ The method is applied to several datasets, including Green Fluorescent Protein (GFP) and Adeno-Associated Virus (AAV) sequences, where the LLM-guided approach consistently yields higher fitness scores than baseline evolutionary algorithms.

6/ The researchers highlight the flexibility of LLMs in optimizing protein sequences, not only for single-objective tasks but also for more complex, multi-objective optimization, demonstrating their potential for high-throughput experiments and real-world applications in protein engineering.

7/ The work offers a promising alternative to conventional directed evolution, which often faces challenges such as slow progress and suboptimal convergence, by harnessing the power of LLMs to efficiently explore vast sequence spaces and identify high-performance variants.

@sohahassoun @xiaolinxu1 @lipingliulp @CanalJiananLi @XiaohuiC16528 @YuanqiD @jacksonleihao @YinkaiW
📜Paper: https://t.co/SSllydThWo
#ProteinEngineering #DirectedEvolution #MachineLearning #LLM #Bioinformatics #SyntheticBiology #ProteinOptimization

1

75

16

57

6K

YinkaiW retweeted

Yuanqi Du

@YuanqiD

over 1 year ago

📢📢📢 Happy to introduce Graph Generative Pre-trained Transformers (G2PT): Can we tokenize graphs and train an autoregressive (AR) model with generative pre-trained transformers to generate graphs? A new work led by @XiaohuiC16528, @YinkaiW, @jacksonleihao. A thread 🧵1/6

4

191

38

89

14K

YinkaiW retweeted

Li-Ping Liu @lipingliulp

almost 3 years ago

Does your transformer have a class token to predict? If so you may want to apply normalization separately for that token: https://t.co/fOSuBXjNIk --> see you at #NeurIPS2023 . (PS: the main idea is from students :))

0

6

1

427

YinkaiW retweeted

Yuanqi Du

@YuanqiD

about 4 years ago

Happy to announce our new initiative AI4Science101. We wrote a series of documents to encourage knowledge sharing and collection in AI for Science from both the view of AI and Science researchers to motivate them to learn, join and work on AI for Science. https://t.co/VAL6kuSqyY

8

118

38

22

0

Yinkai Wang @YinkaiW

about 4 years ago

Very excited to see that!

Learning on Graphs Conference 2026 @LogConference

about 4 years ago

Here it is: the first Learning on Graphs Conference! 🎊 We think this new venue will be valuable for the Graph/Geometric Machine Learning community. What makes it so important+unique? See our blog post! https://t.co/WPPbxxLjrE 1/6

LogConference's tweet photo. Here it is: the first Learning on Graphs Conference! 🎊
We think this new venue will be valuable for the Graph/Geometric Machine Learning community.

What makes it so important+unique? See our blog post!
https://t.co/WPPbxxLjrE
1/6 https://t.co/eDCePwNxPz

4

838

282

119

0

1

0

YinkaiW retweeted

Yuanqi Du

@YuanqiD

over 4 years ago

Machine learning for molecule design is a fast-growing field with massive literature, to the best of our knowledge, we are the first to **comprehensively** review this field, the preprint is now available at Arxiv https://t.co/6tweOqJUfi.

6

223

41

74

0

YinkaiW retweeted

Yanqiao ZHU @Zhu_Yanqiao

over 4 years ago

In real world, noisy or incomplete graphs inevitably lead to unsatisfactory performance. To address the problem, researchers seek to simultaneously optimize the graph structure along with representations, to which we refer as graph structure learning. 1/2 https://t.co/EnxoPIna9m

1

19

6

5

0

YinkaiW retweeted

Yuanqi Du

@YuanqiD

over 4 years ago

If you are working on or interested in graph generation, you may want to check out our recent survey about deep graph generation with methods and applications, available in Arxiv now https://t.co/fD18i4HnZL.

3

161

34

36

0

Yinkai Wang

@YinkaiW

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users