Xinyu Ye

@XinyeYee

Joined April 2026

43 Following

17 Followers

4 Posts

XinyeYee retweeted

Huaxiu Yao

@HuaxiuYaoML

23 days ago

🔥 AutoResearchClaw tech report + v0.5.0 just dropped. 12,300+⭐ on GitHub. Two big additions this release: 🧪 1/ Domain-Expert Agents in the experiment stage: Specialized agents for high-energy physics, biology, and more. Real domain tools + knowledge plugged in — not a generic LLM pretending to run experiments. 📊 2/ ARC-Bench A 55-topic benchmark across ML, HEP, quantum physics, biology, and statistics. One of the broadest cross-disciplinary evaluations for autonomous research ever released. 🏆 The numbers: → Beats AI Scientist v2 by 54.7% on ARC-Bench → 7-mode HITL (human-in-the-loop) ablation: targeted intervention > full autonomy OR exhaustive oversight. The thesis (still): real research isn't a pipeline. Hypotheses fail. Lessons compound. AutoResearchClaw is a research amplifier — not a paper generator. 📄 Tech report: https://t.co/e5FrGJLzrD 💻 Code: https://t.co/KLOcnzFYaD Thanks @itsJiaqiLiu and @StephenQS0710 who lead the work and all other contributors @HaonianJi, @lillianwei423, @XinyeYee, @richardxp888, @HaoqinT, @Xinyu2ML, @WeitongZhang, @jiahengzhang96, @LINJIEFUN, @linjunz_stat, @yuyinzhou_cs, @CaimingXiong, @james_y_zou, @ZhengBerkeley, @cihangxie, @dingmyu

HuaxiuYaoML's tweet photo. 🔥 AutoResearchClaw tech report + v0.5.0 just dropped.

12,300+⭐ on GitHub. Two big additions this release:

🧪 1/ Domain-Expert Agents in the experiment stage: Specialized agents for high-energy physics, biology, and more. Real domain tools + knowledge plugged in — not a generic LLM pretending to run experiments.

📊 2/ ARC-Bench
A 55-topic benchmark across ML, HEP, quantum physics, biology, and statistics. One of the broadest cross-disciplinary evaluations for autonomous research ever released.

🏆 The numbers:
→ Beats AI Scientist v2 by 54.7% on ARC-Bench
→ 7-mode HITL (human-in-the-loop) ablation: targeted intervention > full autonomy OR exhaustive oversight.

The thesis (still): real research isn't a pipeline. Hypotheses fail. Lessons compound. AutoResearchClaw is a research amplifier — not a paper generator.

📄 Tech report: https://t.co/e5FrGJLzrD
💻 Code: https://t.co/KLOcnzFYaD

Thanks @itsJiaqiLiu and @StephenQS0710 who lead the work and all other contributors @HaonianJi, @lillianwei423, @XinyeYee, @richardxp888, @HaoqinT, @Xinyu2ML, @WeitongZhang, @jiahengzhang96, @LINJIEFUN, @linjunz_stat, @yuyinzhou_cs, @CaimingXiong, @james_y_zou, @ZhengBerkeley, @cihangxie, @dingmyu

12K

XinyeYee retweeted

Huaxiu Yao

@HuaxiuYaoML

20 days ago

Every memory system for LLM agents evolves what it stores. None evolves how it retrieves. 🧬 EvolveMem is out, now shipping inside the SimpleMem v0.3.0 update. Powered by AutoResearch: the system researches its own retrieval, treating the full retrieval config as a structured action space and running a closed loop: evaluate ➜ diagnose ➜ propose ➜ validate ➜ repeat. 🔬 From a minimal baseline, 7 autonomous rounds produce a retrieval policy that beats the strongest published baseline by +25.7% on LoCoMo and +18.9% on MemBench. 🧬 It discovers entirely new retrieval dimensions not present in the original design, all integrated into the unified SimpleMem package. 📄 Paper: https://t.co/BWCXebWhG1 💻 Code: https://t.co/hhdgvVjblP Led by @itsJiaqiLiu, @XinyeYee with contributions from @richardxp888, @ZhengBerkeley, @cihangxie

HuaxiuYaoML's tweet photo. Every memory system for LLM agents evolves what it stores. None evolves how it retrieves.

🧬 EvolveMem is out, now shipping inside the SimpleMem v0.3.0 update. Powered by AutoResearch: the system researches its own retrieval, treating the full retrieval config as a structured action space and running a closed loop: evaluate ➜ diagnose ➜ propose ➜ validate ➜ repeat.

🔬 From a minimal baseline, 7 autonomous rounds produce a retrieval policy that beats the strongest published baseline by +25.7% on LoCoMo and +18.9% on MemBench.

🧬 It discovers entirely new retrieval dimensions not present in the original design, all integrated into the unified SimpleMem package.

📄 Paper: https://t.co/BWCXebWhG1
💻 Code: https://t.co/hhdgvVjblP

Led by @itsJiaqiLiu, @XinyeYee with contributions from @richardxp888, @ZhengBerkeley, @cihangxie

423

376

29K

XinyeYee retweeted

Huaxiu Yao

@HuaxiuYaoML

2 months ago

🔥New paper: Omni-SimpleMem 🧠Multimodal lifelong memory for AI agents — text, image, audio & video. 📈Results: 🏆LoCoMo F1: +57% over Mem0 / Claude-Mem 🏆Mem-Gallery F1: +165% over Mem0 / Claude-Mem ⚡ 3.5x faster retrieval 🔬 How it was built: AutoResearchClaw's Human-in-the-Loop Co-Pilot mode: 🧑‍🔬 Humans set the research direction 🤖 AI agents ran ~50 experiments autonomously 🐛 Found bugs worth +175% F1 🏗️ Redesigned architecture ✍️ Optimized prompts humans missed 📄Paper: https://t.co/B7JlpgcUTp 💻Code: https://t.co/9bWRtZxsKH Led by @JiaqiLiu835914, and nice work w/ @YanqingLiu83931, @StephenQS0710, @lillianwei423, @richardxp888, @HaoqinT, @ZhengBerkeley, @cihangxie, @dingmyu

HuaxiuYaoML's tweet photo. 🔥New paper: Omni-SimpleMem

🧠Multimodal lifelong memory for AI agents — text, image, audio & video.

📈Results:
🏆LoCoMo F1: +57% over Mem0 / Claude-Mem
🏆Mem-Gallery F1: +165% over Mem0 / Claude-Mem
⚡ 3.5x faster retrieval

🔬 How it was built: AutoResearchClaw's Human-in-the-Loop Co-Pilot mode:
🧑‍🔬 Humans set the research direction
🤖 AI agents ran ~50 experiments autonomously
🐛 Found bugs worth +175% F1
🏗️ Redesigned architecture
✍️ Optimized prompts humans missed

📄Paper: https://t.co/B7JlpgcUTp
💻Code: https://t.co/9bWRtZxsKH

Led by @JiaqiLiu835914, and nice work w/ @YanqingLiu83931, @StephenQS0710, @lillianwei423, @richardxp888, @HaoqinT, @ZhengBerkeley, @cihangxie, @dingmyu

XinyeYee retweeted

Huaxiu Yao

@HuaxiuYaoML

2 months ago

🚀 Introducing AutoHarness (「Aha」) — automated harness engineering for AI agents. In LLM training, the aha moment is when a model learns to reason. For agents, it's when a better harness makes the same model shine. Agent = Model + Harness. The model reasons. The harness does everything else: 🧠 Context management 🛡️ Tool governance 💰 Cost control 👁️ Observability 💾 Session persistence These are the patterns that separate a toy from a system. AutoHarness automates this entire layer. 🔧 What's inside: - 6-step tool pipeline: parse → classify → permit → execute → sanitize → audit - 3 modes (Core / Standard / Enhanced) — from lightweight to full-featured - Smart context management with token budgeting and multi-layer compression - Full observability: per-call cost tracking, JSONL audit trail, trace diagnostics - Multi-agent profiles with role-based permissions - Any LLM provider Every agent deserves its aha moment. Led by @JiaqiLiu835914, and Kudos to the team @XinyeYee, @richardxp888, @lillianwei423, @HaoqinT @Xinyu2ML, @yuyinzhou_cs, @ZhengBerkeley, @dingmyu, @cihangxie, etc.

HuaxiuYaoML's tweet photo. 🚀 Introducing AutoHarness (「Aha」) — automated harness engineering for AI agents.

In LLM training, the aha moment is when a model learns to reason. For agents, it's when a better harness makes the same model shine.

Agent = Model + Harness. The model reasons. The harness does everything else:

🧠 Context management
🛡️ Tool governance
💰 Cost control
👁️ Observability
💾 Session persistence

These are the patterns that separate a toy from a system. AutoHarness automates this entire layer.

🔧 What's inside:
- 6-step tool pipeline: parse → classify → permit → execute → sanitize → audit
- 3 modes (Core / Standard / Enhanced) — from lightweight to full-featured
- Smart context management with token budgeting and multi-layer compression
- Full observability: per-call cost tracking, JSONL audit trail, trace diagnostics
- Multi-agent profiles with role-based permissions
- Any LLM provider

Every agent deserves its aha moment.

Led by @JiaqiLiu835914, and Kudos to the team @XinyeYee, @richardxp888, @lillianwei423, @HaoqinT
@Xinyu2ML, @yuyinzhou_cs, @ZhengBerkeley, @dingmyu, @cihangxie, etc.

165

158

27K

Xinyu Ye

@XinyeYee

Last Seen Users on Sotwe

Trends for you

Most Popular Users