New preprint! π
We apply classic visual search paradigms from cognitive psychology to multimodal LLMs. Instead of just benchmarking outputs, we probe the human-likeness of their responses. How do models like GPTβ4o, Claude and Llama respond to simple vs. compound visual features? A thread π§΅
We think this type of work is valuable --- rather than identifying whether a model *can* perform some task, we want to get a better picture of the computational/cognitive processes driving performance. This sits somewhere between benchmarking and Mechanistic Interpretability---we try to see what the model is doing at a higher level of abstraction.
@dioscuri I admit I was surprised and saddened when I found out his level of analysis was really the only thing he had time to do. It's one of those things that's really simple in retrospect but that takes a special something to come up with--hallmark of a good idea.
#IJCAI2025 John Burden, University of Cambridge, delivering their talk #8954 on Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture at the Machine Learning (1/4).
@Jsevillamol This seems valid... But also somewhat sad. I don't want AI for business, I want it for general wellbeing and reducing the load on working people.
@fchollet Shameless self-plug here on this topic
https://t.co/srdu6C9iwl
We make a very similar argument in this paper from a legal perspective on the right to have access to unpolluted data.
@BlackHC@OwainEvans_UK Moreover there's the whole " inference between the gaps" thing (another banger from @OwainEvans_UK ). What does an inferential-world look like without any traces of these papers/stories?
@BlackHC@OwainEvans_UK I agree, but I'm not sure how you do this without guaranteeing some kind of screwup? Thinking about canary codes from big bench etc.
@RosieCampbell Super interesting and a bit disturbing about human psychology. But what do we do about this a few years down the line when (presumably) this effect is much more widespread?
@robinhanson I was waiting to see what term you were using CPR for and then have a good laugh at myself for think it was... Whatever CPR actually stands for. Man am I disappointed.