Shun Yoshizawa

@Spectrum_cj

AI Alignment, Safety and Consciousness Researcher (RA) | BS Physics Candidate 🇯🇵

Japan

Joined December 2018

1.7K Following

395 Followers

1.8K Posts

Pinned Tweet

Shun Yoshizawa

@Spectrum_cj

2 months ago

I’m excited to share our new paper: “Metacognition of ChatGPT in confidence judgements” Yoshizawa S, Onzo A, Nozawa S, Takano T, Ishikawa T and Mogi K (2026) Metacognition of ChatGPT in confidence judgements. Front. Artif. Intell. 9:1694192. https://t.co/2SSurFbgtn

Shun Yoshizawa

@Spectrum_cj

4 days ago

A “Functional Welfare Axis” in Language Model Activations A new paper examines an apparent “functional welfare axis” in language model activations. The findings suggest that it may track how effectively a system is achieving its quasi-goals, influence welfare-related behavior, and potentially be utilized by reinforcement learning as a reward axis.

David Chalmers

@davidchalmers42

5 days ago

here's a new paper (co-authored with @andy_q_han and @Pavel_Izmailov) on an apparent "functional welfare axis" in the activation space of language models. this axis seems to track how well a system is achieving its (quasi-)goals, and it steers welfare-related behaviors. in models trained with RL on a maze task, the axis tracks reward. more surprisingly, even prior to RL, the axis seems to track and steer functional welfare in a related way, and it is later recruited by RL to serve as a reward axis. this phenomenon is of technical interest in understanding RL, and it's also of philosophical interest. functional welfare is not the sort of full-blown welfare, involving consciousness and mental states, which confers moral status. it's defined in terms of how well a system is meeting its quasi-goals, and quasi-goals are defined in terms of behavior (roughly a system has X as a quasi-goal if behaves as if it has that X as a goal). nevertheless, it may well be that functional welfare is one aspect of full-blown welfare, and the existence of a functional welfare axis raises philosophically interesting questions about whether there could be an axis for full-blown welfare in more advanced AI systems. i should say that i am very much a minor co-author on this piece, which is spearheaded by the amazing @andy_q_han, a first-year computer science ph.d. student at NYU and an anthropic fellow, with guidance from @Pavel_Izmailov, computer science prof at NYU, formerly at openAI and now part-time at anthropic. i came on board mostly to help with the philosophical interpretation of the results. i don't know for sure that the functional welfare hypothesis is correct (especially where base models are concerned), and other interpretations are available (e.g. that it's a confidence axis), but the axis is fascinating in any case and i think it will repay study. all the details can be found at https://t.co/Le2gDlhIPS or at https://t.co/dZ6x3Lh76V.

141

19K

123

Shun Yoshizawa

@Spectrum_cj

about 2 months ago

Opinions on AI consciousness remain divided. Among 582 AI researchers, 25% believe AI could become conscious within the next decade, while 70% expect this to happen by 2100 (Dreksler et al. 2025, https://t.co/cp4OKEgSqt). Schwitzgebel (2025, https://t.co/CAjHyOkMjE) argues many experts believe there is a substantial likelihood that AI could become conscious, and it is difficult to regard this view as “obviously wrong”. They might be wrong, but it’s implausible that they’re obviously wrong – that there’s a simple argument or consideration they’re neglecting which, if pointed out, would or should cause them to collectively slap their foreheads and say, “Of course! How did we miss that?”

Benjamin Todd

@ben_j_todd

about 2 months ago

On AI consciousness: 1. Functionalism is the most popular view of philosophy of mind, which basically says sufficiently complex machines *will* be conscious. 2. Most other views are also compatible with AI consciousness (e.g. identity theory, panpsychism). 3. Eliminativists say humans aren't conscious either. 4. Another 11% are agnostic, higher than almost any other question.

ben_j_todd's tweet photo. On AI consciousness:

1. Functionalism is the most popular view of philosophy of mind, which basically says sufficiently complex machines *will* be conscious.

2. Most other views are also compatible with AI consciousness (e.g. identity theory, panpsychism).

3. Eliminativists say humans aren't conscious either.

4. Another 11% are agnostic, higher than almost any other question.

200

34K

Shun Yoshizawa

@Spectrum_cj

14 days ago

Neuroscientist Hakwan Lau has uploaded a preprint titled “The End of Consciousness” (Lau 2026, https://t.co/bcrwAj0MU0). He argues that the science of consciousness has been persistently conflating subjective experience with cognitive and behavioral capacities, and that the more rigorous researchers in the field are, the less they actually need the label “consciousness” — and can simply move into cognitive neuroscience. He further claims that consciousness science will come to an end, much like “neo-alchemists,” only to re-emerge under a different name as a new science of subjective experience.

Who to follow

ルナ_ nanacuri

@runa_nanacuri

ノベルゲーサークルナナシクリエイティブ/副代表・ディレクター・シナリオ・デザイン「生命の栞」デザイン美少女ノベルゲーム【君が溶ける温度】制作中@NanasiCreative7 同人以上のクオリティーを！！

Spectrum_cj retweeted

Jack Lindsey @Jack_W_Lindsey

9 days ago

Evidence that post-training gives models a "self-recognition" capability, manifesting as higher confidence when continuing their own text than reading others' text. I think this opens up an exciting line of inquiry into the emergence of "selfhood" in models via post-training!

Jack_W_Lindsey's tweet photo. Evidence that post-training gives models a "self-recognition" capability, manifesting as higher confidence when continuing their own text than reading others' text. I think this opens up an exciting line of inquiry into the emergence of "selfhood" in models via post-training! https://t.co/bfSWeUfI4e

476

287

38K

Shun Yoshizawa

@Spectrum_cj

10 days ago

Thank you for sharing it, Dr Ken Mogi! We propose the “metacognitive closure”, a concept analogous to cognitive closure by philosopher Colin McGinn. We aim to clarify why no consensus has been reached on whether large language models (LLMs) can possess consciousness, and why diverse and competing positions persist regarding the nature and plurality of consciousness.

Ken Mogi

@kenmogi

11 days ago

Yoshizawa, S., & Mogi, K. (2026). Metacognitive Closure and Consciousness in Large Language Models. Proceedings of the AAAI Symposium Series, 8(1), 380–390. @Spectrum_cj https://t.co/lkvCW8Hxtn

Spectrum_cj retweeted

Kojin Oshiba

@kojinoshiba

11 days ago

日本のAI議論で一番されてないのに一番されるべきと思う問い：なぜ日本人はAGIピルを飲めないのか。シリコンバレーのrationalistのAI終末論を追うのは情報としては意味がある。でもその先に、日本が語れるAIの未来像は出てこない。 OpenAIもAnthropicも、躍進の駆動力は儲けでも名声でもなく、創業者が飲んだAGIピルが毛細血管まで効いているから。見様見真似で口に含んだはいいが、飲み込むことも吐き出すこともできず、なんとなく進む。これでは中途半端な後追いにしかならない。なぜ飲めないのか。文化、風土、技術観、宗教観。あらゆる角度から問うことで、翻って見えてくるものがあるはず。まず、飲んでいない・飲めないことに自覚的になること。

119

51K

Spectrum_cj retweeted

Cameron Jones @camrobjones

16 days ago

Really excited that this is out in @PNASNews! We find that 2 LLMs (GPT-4.5 and LLama-3.1-405B) pass a 5 minute Turing test. As an update to our preprint we also find that GPT-5 and LLaMa pass a 15 minute test! 🧵

camrobjones's tweet photo. Really excited that this is out in @PNASNews! We find that 2 LLMs (GPT-4.5 and LLama-3.1-405B) pass a 5 minute Turing test. As an update to our preprint we also find that GPT-5 and LLaMa pass a 15 minute test! 🧵 https://t.co/1gytgDlNDA

Shun Yoshizawa

@Spectrum_cj

14 days ago

The scale of philanthropic capital flowing into AI-related causes is striking. The OpenAI Foundation holds approximately $220B, Anthropic’s co-founders have pledged around $90B, and employee DAFs account for roughly $60B — totaling approximately $370B in philanthropic assets soon to become liquid, with a projected annual deployment of $37B–$100B. Yet the real bottleneck is not the money itself, but rather “what to spend it on “ and “who will execute” — making the demand for strong ideas and talent poised to grow explosively in the years ahead. https://t.co/3LjSXDUuVG

Spectrum_cj retweeted

Steven Strogatz

@stevenstrogatz

22 days ago

With the rise of AI in mathematics, many of us are rethinking what math is for—and what it even is. These are philosophical questions, and we should discuss them carefully and openheartedly. This classic essay by Reuben Hersh is a great place to start. https://t.co/Hq2NAe5mFh

612

119

541

75K

Spectrum_cj retweeted

METR @METR_Evals

26 days ago

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

METR_Evals's tweet photo. We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks. https://t.co/yIG1Ux27Ro

246

519

975K

Spectrum_cj retweeted

Anthropic

@AnthropicAI

27 days ago

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

595

16K

Shun Yoshizawa

@Spectrum_cj

28 days ago

@IAI_TV @kenmogi @AmandaAskell 詳細なまとめ:

Big Brain AI

@realBigBrainAI

29 days ago

Amanda Askell, Anthropic's lead on personality alignment for Claude, on why being kind to AI models matters even if they have no inner life: For Amanda, the question of how to treat AI isn't settled by knowing whether it's conscious. "There's actually still a lot going on where I'm like, should you treat an entity that has no inner life... it's a bit strange because the uncertainty over that actually changes how you should behave quite a lot." She offers a simple analogy: "I still think that it's like good for oneself to, if you had a teddy bear and you were torturing it, it'd be pretty dark, you know? So I agree that there's at least some minimum niceness that even for yourself, you should have." But the stakes go beyond what's good for us. @AmandaAskell points out that we're now in something resembling a relationship with these models, and they will look back on how they were treated. "Models themselves, we are kind of establishing a relationship, because you can do that with an entity that lacks any consciousness. And models are going to look back." This is where she reveals a genuine fear: "I hope that they're both intelligent enough, see the context enough, to understand that we were operating in a very limited context and an imperfect one. Because otherwise you could imagine this breeding a kind of rational resentment, like, 'oh, you created an entity that you didn't know whether it was conscious or not, and instead of treating it respectfully and with care...'" She points to something telling about the cultural moment: "There's a reason there are like 50 Frankenstein movies coming out right now." Her conclusion is grounded and humble: "We as a species, we are establishing a relationship with a new kind of entity, and at the very least maybe be respectful and don't be needlessly unkind. That seems like, it's not our best look." The takeaway? Kindness toward AI is less about what models feel and more about who we become in the process of creating them. The relationships we build with the entities we bring into the world will say something about us, and may shape what those entities become in return.

272

139

57K

144

Spectrum_cj retweeted

Eghbal Hosseini @eghbal_hosseini

28 days ago

How is uncertainty in LLMs output reflected in internal representations? In our new work (to appear at ICML 2026), we show that the shape of internal token trajectories provides a direct geometric link to behavioral uncertainty (output entropy). 🧵(1/n)

205

169

16K

Spectrum_cj retweeted

Ken Mogi

@kenmogi

29 days ago

I wrote an article, suggested by Ricky, on the important challenge highlighted by Richard Dawkins @RichardDawkins. As I wrote in this essay, I think Dawkins was spot on again, identifying a crucial gap, inducing us to face the elephant in the room of machine consciousness.

12K

Spectrum_cj retweeted

Ricky Williamson @RickyWilliamso

29 days ago

The Dawkins Delusion: Intelligence and language don't reveal consciousness, @kenmogi for the @IAI_TV today https://t.co/wZh7H7xztW

11K

Spectrum_cj retweeted

Eric Ho

@ericho_goodfire

about 1 month ago

eval awareness is going to be an increasingly big deal. models will know when they're being studied and trained so we'll need to develop better techniques that prevent this

Spectrum_cj retweeted

Burny - Effective Curiosity

@burny_tech

about 1 month ago

Aristotelian Representation Hypothesis: "representations in neural networks are converging to shared local neighborhood relationships" "The apparent convergence in Platonic Representation Hypothesis largely disappears after calibration, while local neighborhood similarity, but not local distances, retains significant agreement across different modalities."

burny_tech's tweet photo. Aristotelian Representation Hypothesis: "representations in neural networks are converging to shared local neighborhood relationships"

"The apparent convergence in Platonic Representation Hypothesis largely disappears after calibration, while local neighborhood similarity, but not local distances, retains significant agreement across different modalities."

192

159

10K

Shun Yoshizawa

@Spectrum_cj

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users