I probably mentioned this to you before, but now it's statistically proven :-)
Mental effort feels unpleasant across a wide wide range of tasks and circumstances. Work led by Louise David and @erikbij
Meta-RL and dysfunction of the mesolimbic DA system in people with subclinical psychotic symptoms: a small step toward identifying psychosis vulnerability markers.
Thanks to @DamianoTerenzi, @RafRumiati, and Marilena Aiello for making me part of it!
https://t.co/0hfWR5eatJ
If you are a student or academic researcher and want to make progress towards human-level AI:
>>>DO NOT WORK ON LLMs<<<
LLMs are an off ramp.
Thousands of engineers are working on LLMs with enormous computing resources.
The only way you could possibly contribute is by analyzing existing LLMs and showing their power and limitations.
But it's more fun and impactful to come up with new ideas and new architectures and show that they might work, even on small problems.
@amitaishenhav@ElianaVassena If you think the experimental testing is not suitable, then argue a nd the scientific community will evaluate. There is nothing strange or offensive in Eliana's and Will's paper (or in ours). And, above all, it is not aimed at denigratin the EVC theory at all. We acknowledged it.
@amitaishenhav@ElianaVassena A theory is not a property. If you authored a theory, your colleagues have the right/duty to test it. Nothing personal. If your colleagues implemented it wrongly (it happens), you do the calculations and show the correct results to the community.
@harrison_ritz@Tim_Vriens@ElianaVassena@GiovanniPezzulo@saldasbarre Hi Harrison! The cog control signal modulates speed and accuracy under time pressure. The RML predictions on task-related neural activity are good with different process models (I assume you refer to the task-specific modules), no tuning of RML parameters is needed.
Do people with different temperamental styles orient their attention in space differently? In our latest publication on "Cortex", we tried to answer this relevant question. https://t.co/8YkWiz5huW
#academia#research#neuroscience
GPT-4 is getting worse over time, not better.
Many people have reported noticing a significant degradation in the quality of the model responses, but so far, it was all anecdotal.
But now we know.
At least one study shows how the June version of GPT-4 is objectively worse than the version released in March on a few tasks.
The team evaluated the models using a dataset of 500 problems where the models had to figure out whether a given integer was prime. In March, GPT-4 answered correctly 488 of these questions. In June, it only got 12 correct answers.
From 97.6% success rate down to 2.4%!
But it gets worse!
The team used Chain-of-Thought to help the model reason:
"Is 17077 a prime number? Think step by step."
Chain-of-Thought is a popular technique that significantly improves answers. Unfortunately, the latest version of GPT-4 did not generate intermediate steps and instead answered incorrectly with a simple "No."
Code generation has also gotten worse.
The team built a dataset with 50 easy problems from LeetCode and measured how many GPT-4 answers ran without any changes.
The March version succeeded in 52% of the problems, but this dropped to a pale 10% using the model from June.
Why is this happening?
We assume that OpenAI pushes changes continuously, but we don't know how the process works and how they evaluate whether the models are improving or regressing.
Rumors suggest they are using several smaller and specialized GPT-4 models that act similarly to a large model but are less expensive to run. When a user asks a question, the system decides which model to send the query to.
Cheaper and faster, but could this new approach be the problem behind the degradation in quality?
In my opinion, this is a red flag for anyone building applications that rely on GPT-4. Having the behavior of an LLM change over time is not acceptable.
Have you noticed any issues when using GPT-4 and ChatGPT lately? Do you think these problems are overblown?
The last work of my PhD thesis together with @RafRumiati@MassimoSilvetti Marilena Aiello and others
"The impact of Subclinical Psychotic Symptoms on Delay and Effort discounting: insights from behavioral, computational, and electrophysiological methods"
https://t.co/HNizkZe8Yx
Really nice collaboration with @arossotto led by Sean Devine.
We show that facial EMG, and particularly the corrugator supercilii, tracks not only mental effort but also the integration of anticipated effort & reward.
Some good remarks about the mood of many AI researchers and engineers at the moment.
It's easy to make two mistakes and get depressed or feel burned out:
1. Thinking that AI is "solved" or will soon be.
2. Thinking that one can not contribute.
Both are false.
Finally! This is what happens when you work with your best friend @AndreaPisauro on the idea of making the Prisoner Dilemma a continuous game. Our paper is out @NatureComms. Cannot believe it after 6y of hard work! https://t.co/PV1lg8mHOs