Kaize Ding

@kaize0409

Assistant Professor @Northwestern; CS Ph.D. @ASU DMML; Reliable and Efficient AI; Formerly @GoogleDeepMind @MSFTResearch, @Amazon Alexa AI;

Tempe, AZ

Joined March 2018

328 Following

861 Followers

74 Posts

Kaize Ding @kaize0409

8 days ago

@weijie444 Congratulations!

323

Kaize Ding @kaize0409

11 months ago

@realhuawei @SCAI_ASU Congratulations！

183

kaize0409 retweeted

Biology+AI Daily @BiologyAIDaily

about 1 year ago

A Survey of Large Language Models for Text-Guided Molecular Discovery: from Molecule Generation to Optimization １．This is the first focused survey on using large language models (LLMs) for molecule generation and optimization, introducing a novel taxonomy based on learning paradigms—covering both tuning-free (e.g., zero-shot, in-context learning) and tuning-based (e.g., supervised fine-tuning, preference tuning) methods. ２．The survey highlights how LLMs are uniquely positioned for molecular discovery due to their emergent capabilities—such as in-context learning, reasoning, and instruction following—which allow them to generalize across diverse chemical tasks without task-specific retraining. ３．In molecule generation, LLMs are deployed via prompting strategies (e.g., LLM4GraphGen, MolReGPT) or adapted through supervised datasets (e.g., Mol-Instructions, LlaSMol, ChatMol). Preference-tuned models like SmileyLlama and Mol-MoE show improved fidelity to molecular constraints. ４．For molecule optimization, the review examines how LLMs refine existing molecules through goal-directed editing. Strategies include zero-shot optimization (LLM-MDE), retrieval-augmented prompting (ChatDrug), and evolution-based in-context learning (MOLLM, LLM-EO). ５．The survey identifies a trend toward hybrid frameworks combining fine-tuned worker models with external reasoning agents (e.g., MultiMol, DrugAssist), often leveraging GPT-4o or domain-specific scoring functions to enhance candidate selection and validation. ６．Multi-modal modeling is a growing focus, with models like UniMoT and Molx-Enhanced LLM incorporating graph or 3D inputs into LLMs via specialized tokenizers and embedding schemes, enabling structurally-aware generation and optimization. ７．Benchmarking frameworks are categorized into structure-based (validity, uniqueness, diversity) and property-based (LogP, QED, synthetic accessibility, Pareto-optimality) metrics. The paper also provides a detailed summary of standard datasets for pretraining and evaluation. ８．The survey emphasizes the limitations of current LLMs: hallucinations, lack of transparency, and domain-incoherent outputs. Future work should prioritize trustworthy generation, interpretability, and error-aware prompting to enhance reliability. ９．Emerging directions include LLM-driven agent frameworks that integrate external tools (e.g., retrosynthesis engines, docking software) for iterative design, as well as cross-modal models that jointly encode chemical topology, text, and spatial information. １０．A continuously updated repository of LLM-centric molecular research is provided at github, making this survey a central resource for the field. 💻Code: https://t.co/UxjZHjKwSK 📜Paper: https://t.co/DxzJMudNWX #LLM #MoleculeGeneration #MolecularOptimization #DrugDiscovery #ChemLLM #AI4Science #InContextLearning #SMILES #MolecularDesign #LargeLanguageModels

BiologyAIDaily's tweet photo. A Survey of Large Language Models for Text-Guided Molecular Discovery: from Molecule Generation to Optimization

１．This is the first focused survey on using large language models (LLMs) for molecule generation and optimization, introducing a novel taxonomy based on learning paradigms—covering both tuning-free (e.g., zero-shot, in-context learning) and tuning-based (e.g., supervised fine-tuning, preference tuning) methods.

２．The survey highlights how LLMs are uniquely positioned for molecular discovery due to their emergent capabilities—such as in-context learning, reasoning, and instruction following—which allow them to generalize across diverse chemical tasks without task-specific retraining.

３．In molecule generation, LLMs are deployed via prompting strategies (e.g., LLM4GraphGen, MolReGPT) or adapted through supervised datasets (e.g., Mol-Instructions, LlaSMol, ChatMol). Preference-tuned models like SmileyLlama and Mol-MoE show improved fidelity to molecular constraints.

４．For molecule optimization, the review examines how LLMs refine existing molecules through goal-directed editing. Strategies include zero-shot optimization (LLM-MDE), retrieval-augmented prompting (ChatDrug), and evolution-based in-context learning (MOLLM, LLM-EO).

５．The survey identifies a trend toward hybrid frameworks combining fine-tuned worker models with external reasoning agents (e.g., MultiMol, DrugAssist), often leveraging GPT-4o or domain-specific scoring functions to enhance candidate selection and validation.

６．Multi-modal modeling is a growing focus, with models like UniMoT and Molx-Enhanced LLM incorporating graph or 3D inputs into LLMs via specialized tokenizers and embedding schemes, enabling structurally-aware generation and optimization.

７．Benchmarking frameworks are categorized into structure-based (validity, uniqueness, diversity) and property-based (LogP, QED, synthetic accessibility, Pareto-optimality) metrics. The paper also provides a detailed summary of standard datasets for pretraining and evaluation.

８．The survey emphasizes the limitations of current LLMs: hallucinations, lack of transparency, and domain-incoherent outputs. Future work should prioritize trustworthy generation, interpretability, and error-aware prompting to enhance reliability.

９．Emerging directions include LLM-driven agent frameworks that integrate external tools (e.g., retrosynthesis engines, docking software) for iterative design, as well as cross-modal models that jointly encode chemical topology, text, and spatial information.

１０．A continuously updated repository of LLM-centric molecular research is provided at github, making this survey a central resource for the field.

💻Code: https://t.co/UxjZHjKwSK
📜Paper: https://t.co/DxzJMudNWX
#LLM #MoleculeGeneration #MolecularOptimization #DrugDiscovery #ChemLLM #AI4Science #InContextLearning #SMILES #MolecularDesign #LargeLanguageModels

Kaize Ding @kaize0409

about 1 year ago

The research fellow will be working on a collaborative project and jointly advised by me, Prof. Noelle Samia, and Prof. Bonnie Martin-Harris.

453

Who to follow

Yingtong Dou

@dozee_sim

Research Scientist @Visa. CS Ph.D. @UICCS. Foundation Model & Anomaly/Fraud Detection. Opinions are my own.

Yu Zhang

@yuz9yuz

Assistant Professor @TAMU Past: PhD @UofIllinois, Visiting @UW, BS @PKU1898, Intern @MSFTResearch (x3), Data Mining, NLP, AI4Science

Xiaorui Liu

@liu_xiaorui

Assistant Professor in CS@NC State

Kaize Ding @kaize0409

about 1 year ago

📣 We are seeking exceptional postdoctoral candidates on AI4Health at Northwestern University! Please share with anyone who might be interested in this exciting opportunity! #Postdoc #AI4Health #ML #AI #LLM #MedicalAI #Northwestern

kaize0409's tweet photo. 📣 We are seeking exceptional postdoctoral candidates on AI4Health at Northwestern University! Please share with anyone who might be interested in this exciting opportunity! #Postdoc #AI4Health #ML #AI #LLM #MedicalAI #Northwestern https://t.co/riqj6r4cCw

917

kaize0409 retweeted

Yichuan Li

@hai72724774

over 1 year ago

📢 New paper: AskGNN - Making LLMs graph-aware through in-context learning! Our GNN-powered retriever + learning-to-retrieve approach enables LLMs to process graph data effectively. No fine-tuning needed. 7 LLMs tested, 3 tasks, strong results. 🔗 https://t.co/TtVDLC6mE5

Kaize Ding @kaize0409

over 1 year ago

@liu_xiaorui Congrats!

367

kaize0409 retweeted

SIGKDD 2026 @kdd_news

about 2 years ago

🚨 Call for Workshop Papers at #KDD2024 🚨 Submit your paper to the KDD’24 Workshop on Resource-Efficient Learning for Knowledge Discovery 📆 June 30 https://t.co/bhRQN0vwmx

kaize0409 retweeted

Siddharth Bhatia

@siddharthb_

about 2 years ago

***Internship Opportunity*** We're hiring interns. Come, join us in building an ML platform reinvented for real-time. You'll gain hands-on experience in building ML systems from the ground up. 💸 Stipend: ₹1L/month 🗓️ Start Date: May/June 2024 📍 Location: Virtual 🚀 Career Path: We'll roll out PPOs to top performers Open Roles: 1. ML 2. Infra 3. DevOps 4. Streaming Systems 5. UI/UX If you're interested, please fill out this form: https://t.co/mz64pL6xW9 #internship #hiring

108

811

125

773

254K

Kaize Ding @kaize0409

about 2 years ago

@yuz9yuz @TAMU Big congrats!

492

kaize0409 retweeted

WSDM Conference @WSDMSocial

over 2 years ago

We heard that you missed the benefits of the standard registration. Don't worry, the deadline has been extended to February 18th. Register today: https://t.co/PX75CIVcvW #WSDM2024 #WSDMCUP2024 #sigkdd #sigmod #sigir #sigweb #acm

WSDMSocial's tweet photo. We heard that you missed the benefits of the standard registration.
Don't worry, the deadline has been extended to February 18th.

Register today:
https://t.co/PX75CIVcvW

#WSDM2024 #WSDMCUP2024 #sigkdd #sigmod #sigir #sigweb #acm https://t.co/z9Sa15fztD

Kaize Ding @kaize0409

over 2 years ago

📢 Interested in statistical machine learning and data science? Don't forget to submit your application to our Ph.D. program before Jan 5th! Details can be found at: https://t.co/m0gbMQBCG4 If you are also attending NeurIPS, feel free to talk to me!

kaize0409 retweeted

WSDM Conference @WSDMSocial

over 2 years ago

#WSDM2024 is open for registration. Make sure to grab your spot soon. Early registration ends December 17th! Come learn from industry and academia experts such as @Google's VP Elizabeth Hamon Reid from and Nicolas Cristin (@nc2y). Register now at https://t.co/PX75CIVcvW

WSDMSocial's tweet photo. #WSDM2024 is open for registration. Make sure to grab your spot soon. Early registration ends December 17th! Come learn from industry and academia experts such as @Google's VP Elizabeth Hamon Reid from and Nicolas Cristin (@nc2y). Register now at https://t.co/PX75CIVcvW https://t.co/PnhXV6hUmt

Kaize Ding @kaize0409

over 2 years ago

@SharonYixuanLi Big Congratulations!

326

kaize0409 retweeted

Yichuan Li

@hai72724774

over 2 years ago

🎉 Exciting News! 📢 Our paper on "GRENADE: Graph-Centric Language Model for Self-Supervised Representation Learning on Text-Attributed Graphs" has been accepted at #EMNLP findings! 📚🔍 Thanks for the co-authors @kaize0409, Kyumin Lee. State tuned for the preprint and code.

kaize0409 retweeted

data insight miner @yanjiefoo

over 2 years ago

Working with Prof. Liu and Prof. Wang (my former phd and visiting phd students) to organize a special issue on Data Centric AI. Mathematics is a very decent journal. We are looking forward to your discussions on such interesting topics.

kaize0409 retweeted

WSDM Conference @WSDMSocial

over 2 years ago

We are pleased to announce the call for Workshop Proposals for the #WSDM2024, which will take place for the first time in LATAM at Mérida, México https://t.co/m4o3NhPfh2 Proposals Due: October 5, 2023 Acceptance Notifications: November 2, 2023

WSDMSocial's tweet photo. We are pleased to announce the call for Workshop Proposals for the #WSDM2024, which will take place for the first time in LATAM at Mérida, México

https://t.co/m4o3NhPfh2

Proposals Due: October 5, 2023
Acceptance Notifications: November 2, 2023 https://t.co/tolGVHcrFH

Kaize Ding @kaize0409

almost 3 years ago

Happy to chat about research if you are around! Also, I'm recruiting students to join my group at Northwestern University. Let me know if you are interested!

Kaize Ding @kaize0409

almost 3 years ago

Heading to Long Beach for KDD'23! This time I will present our recent work "Learning Strong Graph Neural Networks with Weak Information". If you are interested in data-efficient graph learning, you are welcome to join the oral and poster sessions!

kaize0409's tweet photo. Heading to Long Beach for KDD'23! This time I will present our recent work "Learning Strong Graph Neural Networks with Weak Information". If you are interested in data-efficient graph learning, you are welcome to join the oral and poster sessions! https://t.co/qF4WS5W037

Kaize Ding @kaize0409

almost 3 years ago

Oral: 4:00 pm – 4:20 pm, Tuesday, August 8, Room 201A Poster: 4:00 pm – 4:20 pm, Monday, August 7, Hall A, #386

925

Kaize Ding

@kaize0409

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users