🚀 Excited to share that our work, AMA-Bench, has been accepted to #ICML2026!
Most benchmarks test dialogue memory, but real agents learn through continuous environment interactions. We actually found that systems acing dialogue benchmarks completely struggle in true agentic settings! 🤯
To fix this, we introduce AMA-Bench to evaluate long-horizon memory in real applications, plus AMA-Agent—a new system designed to track causality and objective info across long trajectories. 🧠
🌐 Check it out: https://t.co/3y2wyXwVyL
See you at ICML! 🎉
We are excited to release AMA-Bench 🎉
Our goal is to evaluate agent memory itself, not just dialogue.
Many existing memory benchmarks are still centered on conversation or long context. But real agent memory happens over long horizon agent-environment trajectories, with machine generated observations, evolving states, and causal structure.
AMA-Bench includes real world + synthetic settings across multiple domains, and we also introduce AMA-Agent.
Paper: https://t.co/TcaK5ciIvm
Project: https://t.co/3y2wyXwVyL
Dataset: https://t.co/o4qsmRconS
#LLM #AgentMemory #Memory #Agent
Seedance-2 and Kling-3 signal that AI video generation is entering a “photorealistic” era, but realism does not guarantee reasoning and scientific correctness. In the classic breaking dry spaghetti experiment, real fractures arise from elastic energy and stress waves, often producing three pieces. Yet models frequently generate physically incorrect snaps.
We introduce VideoScience-Bench to evaluate scientific reasoning in video generation. Most models look convincing but lack true scientific understanding, with only weak signals from Kling-3, Sora-2, and Veo-3.
Learn details from our blog 👇🧐🧪
https://t.co/Vm1ncKnm58
So excited to share this work #TRANSIENTTABLES 🎉 Addressing temporal reasoning gaps in LLMs is crucial. Honored to present our work on LLM temporal reasoning at @naaclmeeting. See you at #NAACL2025!
#NAACL2025 Paper
1/6 🚀 Thrilled to announce our paper, "TRANSIENTTABLES: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables," has been accepted for an oral presentation at #TRANSIENTTABLES#NAACL2025!
🎉 We explore how LLMs handle information that changes over time in tables. Check it out: https://t.co/qoDyrWQ3Sp
I am in Singapore for #EMNLP2023.
Check out poster our “Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data“.
We know SOTA LLMs struggle with numbers, but where exactly does the challenge lie? 🧵
https://t.co/j4ucdk4ek8
🌟 News: I'm on the academic market! 🌟
🔍 Seeking: Tenure-Track Faculty positions in Computer Science. I work on Natural language processing systems.
💡 Expertise: Elevating reasoning and inference capabilities in semi-structured tabular data.
#NLProc#AI#AcademicTwitter
Excited to share that our paper is accepted at #EMNLP2023 🎉. In this work we evaluated the language models across various numerical reasoning tasks on tabular data.
It was pleasure working with @akhtarmubashara, @keviv9 , Arpit
Researchers/engineers from our #AI Engineering Group are co-authors on 4 papers at @aaclmeeting this week; learn more about their #NLProc research, why the results are notable, and how their work will advance the state-of-the-art in #NLProc
https://t.co/PhN9VAFi1f
#AACL22
(4/4) Congrats to @IITGuwahati's Abhilash Reddy Shankarampeta, @UUtah's @keviv9 & @imsure318 for having their paper “Enhancing Tabular Reasoning with Pattern Exploiting Training” accepted for #aacl22
4. Enhancing Tabular Reasoning with Pattern Exploiting Training @suki_2022 (Non-Archival), we use the PET (AdaPET) pre-training i.e. MLM objective jointly train with tabular inference task for entity style tabular data. Joint work with @areddys53. -8/n
The Animal-AI Olympics is an AI competition with tests inspired by animal cognition. The Animal-AI environment is a new AI experimentation and evaluation platform that implements ideas from animal cognition in order to better train AI agents that possess cognitive skills.
#Fake_News_Detection
Identifying fake news is one of the most challenging and open ended tasks of AI today!
We have seen the devastating effects of Fake News especially nowadays in the case of the Covid-19 pandemic..