🚀 Excited to share our EMNLP 2025 Demo Paper!
MALLM: Multi-Agent Large Language Models Framework
Conduct your experiments on agents discussions, and decisions.
🎥 https://t.co/mgBKg8UlQx
👋 See our poster in Session 11 (Thu, Nov 6 · 16:30–18:00, Hall C)
#EMNLP2025
We will present our work "Stay Focused: Problem Drift in Multi-Agent Debate" at #EACL2026 this week.
🗓️ Our session is Friday (Poster Session 6, 11:00-12:30).
If you are around, feel free to stop by so we can have a chat or just say hi.
Paper: https://t.co/KmNqCj0AkT
🚀 Stay Focused: Problem Drift in Multi-Agent Debate
✅ Accepted at #EACL2026
🗣️ MAD drifts from the original problem over turns
🔎 Analysis across reasoning, knowledge & instruction-following
🛠️ DRIFTJudge (detect) & DRIFTPolicy (mitigate)
📄 Preprint: https://t.co/q3Jwe01Scn
🚀 MALLM: a plug-and-play framework for multi-agent debate.
✅ Accepted as #EMNLP2025 Demo
🔧 144+ configurations out of the box
🔎 Find the best multi-agent setup for your research
📄 Preprint: https://t.co/DrPFdYoUZ6
🧪 Demo: https://t.co/g3Qru5uDLj
GippLab attending #ACL2025NLP in Vienna this week!
📄 We presented three papers 🙌
🔗 Find the titles and links to all papers in the comments below👇
#ACL#ACL2025#ACL25
❓ What is Problem Drift in multi-agent debate?
This example shows how agents start with a good solution. However, it gets worse with longer debates.
One agent induces a logical error in the debate. The other agents agree without skepticism, leading to the wrong solution.
Stay Focused: Problem Drift in Multi-Agent Debate
Multi-agent LLMs are prone to making errors during longer interactions.
Check out how we define this "problem drift", investigate its reasons, and test detection and mitigation strategies at test-time.
https://t.co/q3Jwe02q1V
The ethical alignment of Multi-Agent LLMs (MALLM) collapses during ongoing discussions.
This raises concerns about AI safety, highlighting how multi-agent settings come with novel safety challenges that weren't relevant in single-agent scenarios.
Paper: https://t.co/lh8Ur7Ldgz
🚀MALLM🚀 - Conduct your own multi-agent research by using our new framework for problem-solving:
https://t.co/Pi3wjn8fsP
MALLM comes with a dataset loader, easy yet configurable discussion formats, and an integrated evaluation pipeline. 🥳
Multi-Agent LLMs for Conversational Task-Solving 💬
Contributions:
1) Taxonomy of multi-agent systems for task-solving
2) Multi-agent framework for your studies
3) Identifies three problems with multi-agent systems: Performance, Alignment, Monopolization
https://t.co/nz4a9JUbTV
@yang3kc Nice work! The more information you give in the prompt and the more the model has to care for during the generation, the less focus can be on the actual task. So it makes total sense that reasoning gets worse here.
Would be interesting to see other tasks evaluated like this too
@mckaywrigley I already use @cursor_ai a lot. Within minutes, I got bar charts, line graphs, and correlation matrices for my research project.
I just needed to adjust some little things to make it look nice.
8/8 🚀 Challenges:
We identify 9 overarching challenges in text generation.
These are bias, misuse, reasoning, hallucinations, privacy, transparency, interpretability, datasets, and computing.
For each, we survey state-of-the-art research and provide research directions.
📑 Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Explore recent advances in text generation since 2017, focusing on five core sub-tasks and highlighting key research gaps.
🔗 Read the paper: https://t.co/97IMuVnS7O
🧵 1/8
7/8 📊 Evaluation:
Researchers heavily rely on automated metrics.
We find that most works use n-gram overlap metrics like BLEU, ROUGE, and METEOR.
We raise awareness about other metrics to complement evaluation (statistical, graph-based, model-based).