Yingtong Dou @dozee_sim - Twitter Profile

17 days ago

Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit.

tatsu_hashimoto's tweet photo. Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit. https://t.co/VhshLOWBIx

32

1K

152

908

218K

dozee_sim retweeted

Drew Breunig

@dbreunig

4 months ago

If you're doing applied AI research (especially system design, benchmarks, evals, efficiency, or ops) you should be submitting to the Conference on AI and Agentic Systems... https://t.co/xGRBKOqXgJ

dbreunig's tweet photo. If you're doing applied AI research (especially system design, benchmarks, evals, efficiency, or ops) you should be submitting to the Conference on AI and Agentic Systems... https://t.co/xGRBKOqXgJ https://t.co/ibAEIl1QC3

7

264

34

244

26K

Yingtong Dou @dozee_sim

7 months ago

🧵(4/6) 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭. TGPT achieves a significant improvement over a production model on transaction classification. TGPT excels at generating realistic future transaction trajectories, opening up new avenues for forecasting and personalization.

0

40

Yingtong Dou @dozee_sim

7 months ago

🧵(3/6) 3𝐃-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞. We design a new architecture, specifically to model and fuse the complex, multi-modal nature of transaction data. This approach hierarchically encodes transaction features, individual transactions, and their sequences.

1

0

46

Who to follow

Kaize Ding

@kaize0409

Assistant Professor @Northwestern; CS Ph.D. @ASU DMML; Reliable and Efficient AI; Formerly @GoogleDeepMind @MSFTResearch, @Amazon Alexa AI;

Yu Zhang

@yuz9yuz

Assistant Professor @TAMU Past: PhD @UofIllinois, Visiting @UW, BS @PKU1898, Intern @MSFTResearch (x3), Data Mining, NLP, AI4Science

Kelvin (Keqiang) Yan

@KeqiangY

RS at Bytedance Seed. DE Shaw Research Doctoral Fellow. AI and LLMs for Scientific Discovery. Ex @MSFTResearch @Princeton @PKU1898, etc. Opinions are my own.

Yingtong Dou @dozee_sim

7 months ago

🧵(4/6) 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭. TGPT achieves a significant improvement over a production model on transaction classification. TGPT excels at generating realistic future transaction trajectories, opening up new avenues for forecasting and personalization.

0

31

Yingtong Dou @dozee_sim

7 months ago

🧵(3/6) 3𝐃-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞. We design a new architecture, specifically to model and fuse the complex, multi-modal nature of transaction data. This approach hierarchically encodes transaction features, individual transactions, and their sequences.

1

0

33

Yingtong Dou @dozee_sim

9 months ago

@tydsh I've been tuning MobileLLM on domain-specific data and its performance is excellent. Definitely try R1 later.

0

1

0

99

dozee_sim retweeted

Guohao Li 🐫

@guohao_li

10 months ago

Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS, Windows. 100% open-source on Github. Comment for 500 extra credits.

142

683

137

803

221K

dozee_sim retweeted

Michael Galkin @michael_galkin

about 1 year ago

📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”. Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it! 🧵1/10

michael_galkin's tweet photo. 📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”.
Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it!
🧵1/10 https://t.co/T7qzokGCSR

5

294

50

170

84K

dozee_sim retweeted

FORTUNE

@FortuneMagazine

over 1 year ago

Visa President of Technology Rajat Taneja rebuilt the company’s data platform from scratch, helping position it for the generative AI boom. https://t.co/fJ6NUr4ccW

1

7

1

2

6K

Yingtong Dou @dozee_sim

over 1 year ago

@Wenhanacademia Welcome onboard Wenhan!

1

0

93

dozee_sim retweeted

Yangqing Jia

@jiayq

over 1 year ago

There is a lot of unconscious emphasis of the DeepSeek model being “Chinese” and implicit connection with the Sino-US relationship or the GPU power. In my eyes, the success of DeepSeek has little to do with that. It is simple intelligence and pragmatism at work: given a limit of computation and manpower present, produce the best outcome with smart research. Same with the AlexNet model when Alex Krizhevsky needed to make magic with 2 GPUs, and not a supercluster. There are a lot of super smart AI people and companies in the world. In terms of the Chinese ethnic group, people I had the privilege to have worked with include (but are not limited to) - Kaiming He who is the OG of modern computer vision. - Song Han who founded DeePhi, OmniML and now professor at MIT. - the DMLC folks who created early frameworks like MxNet and TVM. - Bing Xu who did MxNet, was coauthor of GAN, founded HippoML and is now at NVidia. - Orbeus, a startup on early CV applications and now the foundation of AWS ReKognition. And many more. They ace in the frontier of AI, whether it’s research, product, small startups, or big companies. AI should bring us closer rather than more separate. I was saddened by the discriminative comments given by Professor Rosalind Picard at NeurIPS, but was too busy to put my thoughts together and say something. Looking back at 2024, I think what really stood out is the fundamental seek for AI breakthrough - collect what we have, use our brain, and achieve our best. It’s like the Olympics: faster, higher, stronger, together.

21

528

60

127

68K

dozee_sim retweeted

Zhuang Liu

@liuzhuang1234

over 1 year ago

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit visual generation from an LLM, when trained jointly with visual understanding.

liuzhuang1234's tweet photo. How far is an LLM from not only understanding but also generating visually?

Not very far!

Introducing MetaMorph---a multimodal understanding and generation model.

In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit visual generation from an LLM, when trained jointly with visual understanding.

25

718

133

537

253K

dozee_sim retweeted

Aurimas Griciūnas

@Aurimas_Gr

over 1 year ago

What is 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚? In real world applications, simple naive RAG systems are rarely used nowadays. To provide correct answers to a user query, we are always adding some agency to the RAG system. However, it is important to 𝗻𝗼𝘁 𝗴𝗲𝘁 𝗹𝗼𝘀𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗯𝘂𝘇𝘇 𝗮𝗻𝗱 𝘁𝗲𝗿𝗺𝗶𝗻𝗼𝗹𝗼𝗴𝘆 and understand that there is 𝗻𝗼 𝘀𝗶𝗻𝗴𝗹𝗲 𝗯𝗹𝘂𝗲𝗽𝗿𝗶𝗻𝘁 to add the mentioned agency to your RAG system and you should adapt to your use case. My advice is to not get stuck on terminology and think about engineering flows. Let’s explore some of the moving pieces in Agentic RAG: 𝟭. Analysis of the user query: we pass the original user query to a LLM based Agent for analysis. This is where: ➡️ The original query can be rewritten, sometimes multiple times to create either a single or multiple queries to be passed down the pipeline. ➡️ The agent decides if additional data sources are required to answer the query. 𝟮. If additional data is required, the Retrieval step is triggered. In Agentic RAG case, we could have a single or multiple agents responsible for figuring out what data sources should be tapped into, few examples: ➡️ Real time user data. This is a pretty cool concept as we might have some real time information like current location available for the user. ➡️ Internal documents that a user might be interested in. ➡️ Data available on the web. ➡️ … 𝟯. If there is no need for additional data, we try to compose the answer (or multiple answers) straight via an LLM. 𝟰. The answer (or answers) get analyzed, summarized and evaluated for correctness and relevance: ➡️ If the Agent decides that the answer is good enough, it gets returned to the user. ➡️ If the Agent decides that the answer needs improvement, we try to rewrite the usr query and repeat the generation loop. The real power of Agentic RAG lies in its ability to perform additional routing pre and post generation, handle multiple distinct data sources for retrieval if it is needed and recover from failures in generating correct answers. What are your thoughts on Agentic RAG? Let me know in the comments! 👇 #RAG #LLM #AI

Aurimas_Gr's tweet photo. What is 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚?

In real world applications, simple naive RAG systems are rarely used nowadays. To provide correct answers to a user query, we are always adding some agency to the RAG system.

However, it is important to 𝗻𝗼𝘁 𝗴𝗲𝘁 𝗹𝗼𝘀𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗯𝘂𝘇𝘇 𝗮𝗻𝗱 𝘁𝗲𝗿𝗺𝗶𝗻𝗼𝗹𝗼𝗴𝘆 and understand that there is 𝗻𝗼 𝘀𝗶𝗻𝗴𝗹𝗲 𝗯𝗹𝘂𝗲𝗽𝗿𝗶𝗻𝘁 to add the mentioned agency to your RAG system and you should adapt to your use case. My advice is to not get stuck on terminology and think about engineering flows.

Let’s explore some of the moving pieces in Agentic RAG:

𝟭. Analysis of the user query: we pass the original user query to a LLM based Agent for analysis. This is where:

➡️ The original query can be rewritten, sometimes multiple times to create either a single or multiple queries to be passed down the pipeline.
➡️ The agent decides if additional data sources are required to answer the query.

𝟮. If additional data is required, the Retrieval step is triggered. In Agentic RAG case, we could have a single or multiple agents responsible for figuring out what data sources should be tapped into, few examples:

➡️ Real time user data. This is a pretty cool concept as we might have some real time information like current location available for the user.
➡️ Internal documents that a user might be interested in.
➡️ Data available on the web.
➡️ …

𝟯. If there is no need for additional data, we try to compose the answer (or multiple answers) straight via an LLM.
𝟰. The answer (or answers) get analyzed, summarized and evaluated for correctness and relevance:

➡️ If the Agent decides that the answer is good enough, it gets returned to the user.
➡️ If the Agent decides that the answer needs improvement, we try to rewrite the usr query and repeat the generation loop.

The real power of Agentic RAG lies in its ability to perform additional routing pre and post generation, handle multiple distinct data sources for retrieval if it is needed and recover from failures in generating correct answers.

What are your thoughts on Agentic RAG? Let me know in the comments! 👇

#RAG #LLM #AI

15

1K

224

1K

197K

dozee_sim retweeted

Jianfeng Chi @jianfengchi

over 1 year ago

I am hiring a research intern, working LLM (Llama 3+) safety. The internship is expected to start in Summer/Spring 2025, based in New York City. Please drop me an email at [email protected] (Subject starts with "[2025 Intern]") Learn more here: https://t.co/hpeYKwP0nG

3

262

30

270

32K

dozee_sim retweeted

William Wang

@WilliamWangNLP

over 1 year ago

Respectfully disagree. It's the structure of language and words that make LLMs effective. Pure speech, time series, or video without linguistic co-supervision don't yield the same results. Language provides the minimal conceptual units that enable these models to work.

9

155

15

69

31K

dozee_sim retweeted

Simran Arora

@simran_s_arora

almost 2 years ago

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and asymptotically faster than attention 💨 But there’s no free lunch 🥪 these models can’t fit all the information from long contexts into the limited memory, degrading in-context learning quality. Is all lost?

simran_s_arora's tweet photo. Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!!

There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and asymptotically faster than attention 💨 But there’s no free lunch 🥪 these models can’t fit all the information from long contexts into the limited memory, degrading in-context learning quality. Is all lost?

7

299

57

174

93K

Yingtong Dou @dozee_sim

almost 2 years ago

The review form of #NeurIPS 2024 is cumbersome!

0

2

0

541

Yingtong Dou

@dozee_sim

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users