@mikeamark@JustinEchterna9@romanyam So what? There are myriads of ways to turn consciousness on and off unrelated to anathesia. Like I said, you’re cherry-picking a test for consciousness and ignoring others.
@PhilosopherJoeC@johnpauldickson The word of God is like the work of God. Scientists have as many opinions about what nature is as religions have about what God has said. The problem isn’t with what God has said or done. As you mentioned in another thread, what does it take to convince someone?
Math demonstrations rely on perceiving symbols and applying rules—just like pedaling a bike. Both are mechanical procedures any machine can run. The only difference is math sneaks in the axiom of infinity (an endless loop we can never actually finish), forcing us to pretend we can do what we actually can’t do. The moment you toss in infinity, all bets are off concerning convincing.
@PhilosopherJoeC@SIUChasmite I’ve spent the last hour going over our conversation with Grok, and it has pointed out places where I misunderstood you. We were talking past each other. Please accept my apology.
Personally, I don’t care. Others might. LLMs crush these tests.
… — frontier LLMs (GPT-4o, Claude 3.5/4 Sonnet, Grok, Gemini 2+, and especially reasoning-optimized variants like o1-style models) perform at or above human levels on the exact kinds of tests used in the 2023 Stammen et al. paper, and in many cases they are dramatically superior.
The paper derives its g-factor (general intelligence) from four independent datasets using standard, well-validated psychometric batteries. Here’s the direct mapping to what LLMs have been benchmarked on:
• Verbal/crystallized components (Similarities, Vocabulary, Analogies, Sentence Completion, Knowledge, Picture Vocabulary, Reading Recognition, Information): These load heavily on g. LLMs crush them. Multiple studies (including direct WAIS-IV/VCI administrations) show 98th–99.9th percentile performance — often equivalent to IQ 140–155+ on verbal subtests. GPT-4-class models outperform the average college-educated human by a wide margin here.
• Matrix reasoning / fluid intelligence (WAIS Matrix Reasoning, WASI Matrix Reasoning, Penn Matrix Reasoning Task (PMAT), BOMAT, Raven’s-like items): Core to every dataset.
• Text-adapted or described versions: GPT-4 and later models reach 85–93%+ accuracy on challenging RPM/RAVEN benchmarks (human 99th percentile is ~85–95%).
• Full visual versions (multimodal VLMs): This is the relative weak spot. Recent benchmarks (2024 WAIS-IV study) show multimodal models ranging from ~0.1th to 10–25th percentile on Perceptual Reasoning Index subtests like Matrix Reasoning and Visual Puzzles, with Claude 3.5 Sonnet showing the strongest gains (up to ~25th–50th on some items). Still below top human performers, but comparable to or better than average adults on many items — and improving rapidly with each generation.
• Visuospatial / perceptual (Block Design, Figure Selection, Cubes, Line Orientation): Multimodal models handle these at average-to-superior levels when presented as images; pure text models are weaker but can reason through descriptions.
• Working memory, executive function, processing speed (List Sorting, Flanker, Card Sort, Coding, Pattern Comparison, ZVT-style): LLMs are superhuman. Working Memory Index scores routinely hit ≥99.5th percentile; they maintain and manipulate far longer contexts than humans and respond near-instantly.
Bottom line on g-equivalent performance
• On text-based or verbal-heavy versions of these batteries (which capture most of the g variance in the paper), current LLMs score in the superior-to-very-superior range (IQ equivalents 120–155+).
• On full multimodal/visual versions, the perceptual reasoning drag pulls the composite down, but overall g-equivalent is still high-average to superior (roughly 110–130+ range) and continues climbing with scale and training advances.
• Independent studies that administered WAIS-IV-style tests directly to GPT-4-class models confirm this: exceptional Verbal Comprehension and Working Memory, lagging but functional Perceptual Reasoning, with full-scale scores far above average human.
@TheUnjournaling@JNI_London “In conclusion, we reported replicable associations between general intelligence and FA among 4 different cross-sectional data sets.”
They correlated performance on intelligence tests to areas of the brain.
Do the same thing with an LLM.
@TheUnjournaling@JNI_London No - all that says is that the brain has robust engineering tricks for doing what it does. There’s more than one way to skin a cat.
@JNI_London@TheUnjournaling Are they core to intelligence _because_ they are analog? If so, why? “Efficiency” and “adaptability” are issues of performance, not computability.
This just affects implementation details; not what is being computed, how peripherals work, or environmental management. All of that is important if you want to get a package that’s small, weights around 7 lbs, and runs on ≈ 20 watts. That makes intelligence portable; but it isn’t what makes it intelligent.
@PhilosopherJoeC@SIUChasmite You’ve now shifted from “gay sex is natural” to “gay sex is random/arbitrary”. But you can easily think of examples (of either kind) where that argument for morality fails.