@livgorton We have a solution to this problem, "CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation" (ACL'26 oral) led by @yeemanchoi and @eagle_hz ! Hope research reviewing platforms such as @openreviewnet can consider adopting the integration as well :)
Can LLMs cite like humans?๐ง Meet CiteGuard ๐ก๏ธour retrieval-augmented agent for faithful citation attribution. +17% over prior baselines and 68.1% on CiteME, near-human accuracy #ACL2026#ai4scientist#LLM
๐จ Excited to share that our paper CiteGuard https://t.co/1mEElRMkm8 is accepted to ACL 2026 (Main)!
LLMs are powerful for scientific writingโbut up to 90% of their citations can be fabricated.
Why this matters + our solution ๐
๐ข The First Call for Papers for EMNLP 2026 is officially out! ๐
We welcome long & short papers featuring original research on empirical methods for NLP.
๐๏ธ ARR Submission Deadline: May 25, 2026
๐ Read the full CFP here: https://t.co/GU2kaISjUG
#EMNLP2026
We need more ๐ผ๐ฝ๐ฒ๐ป, ๐ฟ๐ฒ๐ฎ๐น๐ถ๐๐๐ถ๐ฐ ๐ฎ๐ด๐ฒ๐ป๐ ๐ฒ๐ป๐๐ถ๐ฟ๐ผ๐ป๐บ๐ฒ๐ป๐๐ for training and evaluating agents! ๐ก
๐But what are the ๐ถ๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐ ๐ฒ๐ป๐๐ถ๐ฟ๐ผ๐ป๐บ๐ฒ๐ป๐๐ ๐๐ผ ๐ฏ๐๐ถ๏ฟฝ๏ฟฝ๏ฟฝ๐ฑ?
๐What are the ๐ถ๐ป๐ณ๐ฟ๐ฎ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ ๐ฏ๐ผ๐๐๐น๐ฒ๐ป๐ฒ๐ฐ๐ธ๐ for these environments in training and evaluation, and how can we ๐๐ฐ๐ฎ๐น๐ฒ ๐๐ฝ the number of available environments?
๐Most importantly, how should we utilize these environments: ๐ฅ๐ ๐ผ๐ฟ ๐ฏ๐ฒ๐๐ผ๐ป๐ฑ?
If youโre interested in discussing together, come join us at our workshop on โ๐๐๐๐ก๐๐ฃ๐ ๐๐ฃ๐ซ๐๐ง๐ค๐ฃ๐ข๐๐ฃ๐ฉ๐จ ๐๐ค๐ง ๐ผ๐๐๐ฃ๐ฉ๐จโ @NeurIPSConf tmr (7th Dec)! We have an amazing lineup of invited speakers and panelists, including ๐๐๐ฐ๐๐ซ๐ ๐๐ซ๐๐๐๐ง๐ฌ๐ญ๐๐ญ๐ญ๐ from ๐๐จ๐จ๐ ๐ฅ๐ ๐๐๐๐ฉ๐๐ข๐ง๐ and ๐๐ก๐ฎ๐ฒ๐๐ง ๐๐ก๐จ๐ฎ from ๐๐ฎ๐ค๐.
Also check out our latest ๐ฌ๐ฎ๐ซ๐ฏ๐๐ฒ ๐ฉ๐๐ฉ๐๐ซ on the topic led by Yuchen Huang: https://t.co/gvsdqkp6jf ๐ฏ
The SEA Workshop at @NeurIPSConf 2025 is coming next Sunday. It seems we urgently need more open, realistic agent environments for training and evaluating agents. But what are the important environments to build? What are the infrastructure bottlenecks for these environments in training and evaluation? How can we scale up the number of available environments? And how should we use these environments, RL or beyond? These questions are still not clear.
Weโre bringing together an amazing list of speakers and panelists to spark the discussion: @egrefen, @Mike_A_Merrill, @mialon_gregoire, @deepaknathani11, @jl_marino, @syz0x1, @qhwang3, Anthony G. Cohn, Eric Sommerlade, and @fredsala. You wonโt want to miss it if youโre around.
Also, huge thanks to our four sponsors, @TheInclusionAI (@AntLingAGI), @SnorkelAI, @SonicjobsApp, and @VmaxAI for their generous support!
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flowโnot just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. ๐งต
#EMNLP Keynote by @hengjinlp:
No more Processing. Time to Discover!
AI for Science is just so exciting! Let us make LLMs discover like true scientists: Observe โ Think โ Propose and Verify
(A pity to miss the talk. Photo from @May_F1_@emnlpmeeting )
What if your policy could reason and think dynamically, especially about uncertainty, enabling better real-world behavior?
โก๏ธIntroducing EBT-Policy, an instantiation of Energy-Based Transformers for Policies!
TLDR:
- EBT-Policy broadly outperforms Diffusion Policy in both simulated and real-world tasks, while using significantly less (up to 50x less) resources during both training and inference.
- EBT-Policy is the first vanilla Behavior Cloning approach to demonstrate emergent zero-shot retry behaviorโrecovering from failures/OOD states using only successful demos, with no retry data/training.
- EBT-Policy successfully learns uncertainty, enabling dynamic compute allocation for action sequences itโs more uncertain about.
๐งตThread:
Introducing UniDoc-Bench: The First Unified Benchmark for Document-Centric Multimodal RAG
๐ Paper: https://t.co/33S6yvibzO
Real documents mix text, tables, and chartsโbut most RAG benchmarks test them in isolation. We built UniDoc-Bench to change that.
๐ What's inside:
โก๏ธ 70K PDF pages across 8 domains
โก๏ธ 1,600 QA pairs grounding text, tables & images
โก๏ธ Fair comparison across 4 RAG paradigms
๐ Key finding: Text-image fusion RAG (68.4%) beats both multimodal joint retrieval (64.1%) and single-modality approaches. Current multimodal embeddings still lag behind combining strong unimodal retrievers.
๐ป Code: https://t.co/gzVHUqRb0i
๐ Data: https://t.co/UndBiJ3Aqy
โก๏ธ Work by Xiangyu Peng @beckypeng6, Can Qin @canqin001, Zeyuan Chen @ZeyuanChen, Ran Xu @stanleyran, Caiming Xiong @CaimingXiong, and Chien-Sheng Wu @jasonwu0731.
#FutureOfAI #EnterpriseAI #MultimodalAI #DocumentIntelligence
Multimodal conversational agents struggle to follow complex policies, which also impose a fixed computational cost.
We ask:
๐ How can we achieve stronger policy-following behavior without having to include policies in-context?
๐: https://t.co/mIdhuPw6Cj ๐งต1/3
Our VISTA workshop at ICDM 2025 is still open for submissions!
If youโre working on GenAI standards, legal constraints, copyright risks, & compliance, weโd love to see your papers! ๐โจ
๐งตMore information and submit:
๐จ Call for Papers: VISTA Workshop @ ICDM 2025 ๐จ
๐ Nov 12, 2025 | ๐ Washington, DC
Explore GenAI standards, legal constraints, copyright risks, & compliance. Submit by Sep 5!
๐ https://t.co/MjmZx8UunI
Speakers: V. Braverman, D. Atkinson, A. Li
#ICDM2025#GenAI#AIStandards
๐จ Deadline Extended! ๐จ
Our Scaling Environments for Agents ๐งโ๐ป๐ค workshop at @NeurIPSConf 2025 is still open for submissions!
If youโre working on scaling, environments, or agents, weโd love to see your papers! ๐โจ
๐ New deadline: Sept 1st
๐งตMore information and submit: