🚀 New paper: AgentSearchBench — a benchmark for AI agent search in the wild.
As agent ecosystems grow, a fundamental question emerges: how do we find the right agent for a task?
We build a benchmark with ~10k real-world agents and show:
• Semantic similarity ≠ agent performance
• Description-based ranking often underestimates capable agents
• Execution-aware probing improves ranking
📄 Paper: https://t.co/lXJjpqTvCM
Joint work with Arastun Mammadli @arastunmammadli (co-first author) and Xiaoyu Zhang, Emine Yilmaz @emine_yilm at the UCL AI Center @ai_ucl and UCL Computer Science @uclcs
#AI #Agents #LLM #InformationRetrieval