Xiangkun Hu

3 months ago

Hello everyone! We are organizing a human review process for papers generated by FARS and are now recruiting volunteer reviewers to participate in evaluating these research outputs. If you are interested in taking part, please fill out the application form: https://t.co/eJskTAMIzs If your application is a good fit, we will contact you via email with instructions on how to participate in the review process. For volunteer reviewers who complete the full review process, we will offer the following in appreciation of your contribution: 📝You will be listed as an author of the FARS Review Report (in alphabetical order by name). ✨$500 in non-expiring credits, granted after FARS launches as a product feature. Thank you for your support. We look forward to working with you to help reshape the future of scientific discovery.

XiangkunHu retweeted

4 months ago

Update on the End of the FARS Live Deployment The first public live deployment of FARS (Fully Automated Research System) has successfully concluded. We sincerely thank everyone for the attention, feedback, and support given to FARS. Key updates: 1. On March 3, 2026, the FARS livestream ended. All papers and code generated during the deployment remain available on https://t.co/Doz2ERClQo. 2. Some FARS papers have been reviewed using Stanford Agentic Reviewer (https://t.co/KKXIQfLeLe). These results are available on the website but do not represent Analemma’s official evaluation. We are organizing a systematic human review, and will soon invite external reviewers to participate and oversee the process. A quality assessment report will be released afterward. 3. Due to arXiv policy restrictions, generative AI cannot be listed as paper authors. We are exploring other indexable channels to distribute selected FARS papers that pass human review, so their citations and impact can be tracked. 4. Later this month, we will launch a product, providing research assistance features and automated research capabilities as FARS demonstrated. We will continue iterating as we aim to turn scientific discovery from a human-limited craft into scalable industrial production. Analemma Team #AI #LLM #Research

4 months ago

@linkangd @AnalemmaAI And sorry for the late reply 😅

4 months ago

@linkangd @AnalemmaAI Thanks for the review! Current version of FARS targets on focused contribution that aligned with "short papers". And yes, AI Scientist (v2) are awesome works.

4 months ago

@linkangd @AnalemmaAI @_Sizhe_Chen_ Hi, what do you think about this paper?

XiangkunHu retweeted

4 months ago

Milestone update: FARS has been live for 39:33:37 — and has already produced 10 research papers. Check details of the produced papers: 🔴 FARS research runs - https://t.co/Wqq8Z6WoR3 📦 Github - https://t.co/8HkrGYnyIG

AnalemmaAI's tweet photo. Milestone update: FARS has been live for 39:33:37 — and has already produced 10 research papers.

Check details of the produced papers:
🔴 FARS research runs - https://t.co/Wqq8Z6WoR3
📦 Github - https://t.co/8HkrGYnyIG https://t.co/MWnsQIlaPH

4 months ago

https://t.co/UbQpO1aMLe

XiangkunHu retweeted

4 months ago

Later, at 10:00 PM Eastern Time, we’ll begin the first public deployment of FARS. https://t.co/Bb8bVhvU4n

903

4 months ago

Fully automated research starts tomorrow.

4 months ago

Today, we’re introducing FARS — a Fully Automated Research System. Tomorrow at 10:00 PM Eastern Time, we’ll begin its first public deployment as a live experiment. During the deployment, FARS will run continuously and autonomously, aiming to produce 100 complete research papers. This deployment is intended to study what automated research looks like at scale. 🔴 Live: https://t.co/v72yQVd0oB 📃 Blog: https://t.co/gh4Nc2Ufaj 📦 GitHub: https://t.co/a9vN1QIbK4 👾 Discord: https://t.co/dwRD3ijoBK #AI #LLMs #research

AnalemmaAI's tweet photo. Today, we’re introducing FARS — a Fully Automated Research System.

Tomorrow at 10:00 PM Eastern Time, we’ll begin its first public deployment as a live experiment.

During the deployment, FARS will run continuously and autonomously, aiming to produce 100 complete research papers.

This deployment is intended to study what automated research looks like at scale.

🔴 Live: https://t.co/v72yQVd0oB
📃 Blog: https://t.co/gh4Nc2Ufaj
📦 GitHub: https://t.co/a9vN1QIbK4
👾 Discord: https://t.co/dwRD3ijoBK

#AI #LLMs #research

13K

132

XiangkunHu retweeted

Run-Ze Fan @Vfrz525_

11 months ago

🚨 New release: MegaScience The largest & highest-quality post-training dataset for scientific reasoning is now open-sourced (1.25M QA pairs)! 📈 Trained models outperform official Instruct baselines 🔬 Covers 7+ disciplines with university-level textbook-grade QA 📄 Paper: https://t.co/tW6Qjv54L5 🤖 Data & Models : https://t.co/TS0wyk7QbF 💻 Code: https://t.co/Q5ydfeIgKQ 🎯Evaluation System: https://t.co/YIRzG2tzKk Details 🧵👇 1. Why MegaScience? While LLMs like o1 and DeepSeek-R1 excel at math & code, they still struggle with science reasoning — largely due to the lack of large-scale, high-quality datasets. 2. What makes MegaScience different? We address 4 core challenges: 🧪 Unreliable benchmark evaluation ☢️ Less rigorous decontamination ❌ Low-quality reference answers 🧠 Superficial knowledge (data) distillation 3. We tackle this from the ground up. First, we introduce TextbookReasoning: 📘 Built from 128K+ university-level science textbooks ⚙️ Fully automated LLM-driven pipeline 🧠 650K QA pairs with reliable reference answers 🌍 Covers 7 major disciplines 4. But we didn’t stop there. We then construct MegaScience — a diverse, hybrid dataset of 1.25M QA pairs, using: * TextbookReasoning * NaturalReasoning * Nemotron-Science We conduct comprehensive ablation studies across different data selection methods to identify the optimal approach for each dataset, thereby contributing high-quality subsets. 5. To evaluate properly, we also open-sourced a reproducible and flexible Scientific Reasoning Evaluation framework with: * 15 science reasoning tasks * Multiple question formats (MCQ, calc, open-ended) * Multi-GPU parallelism & model-agnostic evaluation * Comprehensive answer extraction strategies 6. Results: Models trained on MegaScience consistently outperform official Instruct versions — especially for Qwen3 series. Bigger models see greater gains, showing strong scalability. 7. Everything is open-source: 📚 Dataset 🧪 Evaluation toolkit 🤖 Trained models 🔧 Codebase → Let’s build better science agents together! This work is impossible without all the brilliant co-authors @SinclairWang1 @stefan_fee

Vfrz525_'s tweet photo. 🚨 New release: MegaScience
The largest & highest-quality post-training dataset for scientific reasoning is now open-sourced (1.25M QA pairs)!
📈 Trained models outperform official Instruct baselines
🔬 Covers 7+ disciplines with university-level textbook-grade QA
📄 Paper: https://t.co/tW6Qjv54L5
🤖 Data & Models : https://t.co/TS0wyk7QbF
💻 Code: https://t.co/Q5ydfeIgKQ
🎯Evaluation System: https://t.co/YIRzG2tzKk

Details 🧵👇

1. Why MegaScience?
While LLMs like o1 and DeepSeek-R1 excel at math & code, they still struggle with science reasoning — largely due to the lack of large-scale, high-quality datasets.

2. What makes MegaScience different?
We address 4 core challenges:
🧪 Unreliable benchmark evaluation
☢️ Less rigorous decontamination
❌ Low-quality reference answers
🧠 Superficial knowledge (data) distillation

3. We tackle this from the ground up.
First, we introduce TextbookReasoning:
📘 Built from 128K+ university-level science textbooks
⚙️ Fully automated LLM-driven pipeline
🧠 650K QA pairs with reliable reference answers
🌍 Covers 7 major disciplines

4. But we didn’t stop there.
We then construct MegaScience — a diverse, hybrid dataset of 1.25M QA pairs, using:
* TextbookReasoning
* NaturalReasoning
* Nemotron-Science

We conduct comprehensive ablation studies across different data selection methods to identify the optimal approach for each dataset, thereby contributing high-quality subsets.

5. To evaluate properly, we also open-sourced a reproducible and flexible Scientific Reasoning Evaluation framework with:
* 15 science reasoning tasks
* Multiple question formats (MCQ, calc, open-ended)
* Multi-GPU parallelism & model-agnostic evaluation
* Comprehensive answer extraction strategies

6. Results:
Models trained on MegaScience consistently outperform official Instruct versions — especially for Qwen3 series.
Bigger models see greater gains, showing strong scalability.

7. Everything is open-source:
📚 Dataset
🧪 Evaluation toolkit
🤖 Trained models
🔧 Codebase
→ Let’s build better science agents together!

This work is impossible without all the brilliant co-authors @SinclairWang1 @stefan_fee

255

136

22K

about 1 year ago

@NielsRogge @OpenAI This a very nice figure illustrating multi-turn reinforcement learning with tool use 👍 You might be interested in DeepResearcher we released early this month, which applies RL training with real-world web search. Paper: https://t.co/7dDU1uiHsO Code: https://t.co/uyGLGWRT1H

about 1 year ago

🙏 Huge thanks to my amazing co-authors: @zhengqi18496564, @fudayuan, @lino_cai, @ylmnshn1, @prl576951296911 and @stefan_fee for their incredible contributions to this work! 7/7

188

about 1 year ago

🔍Exciting to introduce DeepResearcher, the first end-to-end trained #DeepResearch model with #RL scaling in real-world environments! ✨No more controlled simulations - this is RL in the wild with authentic search interactions! Paper: https://t.co/7dDU1uiHsO 1/7

XiangkunHu's tweet photo. 🔍Exciting to introduce DeepResearcher, the first end-to-end trained #DeepResearch model with #RL scaling in real-world environments!
✨No more controlled simulations - this is RL in the wild with authentic search interactions!

Paper: https://t.co/7dDU1uiHsO

1/7 https://t.co/TCRxoQfwxR