Alexander Perez

A 10 million parameter model just outperformed deterministic rivals 3 times its size by doing something regular recursive AI dont do: exploring multiple reasoning paths at the same time. Most AI reasoning models are trapped on a single train of thought, and GRAM ("Generative Recursive Reasoning") is the first to break that by letting the model think in parallel universes simultaneously. The problem is that all existing recursive models are fully deterministic, meaning given the same input they always follow the exact same reasoning path and can never escape a wrong trajectory or discover more than 1 valid answer. GRAM fixes this by injecting learned randomness at each refinement step, so the model samples a slightly different direction each time rather than snapping to 1 fixed next state, which produces a spread of diverse reasoning trajectories. At test time the model runs many of these paths in parallel and selects the best one using a small reward predictor trained alongside the main model, adding a "width" scaling axis on top of the usual "depth" axis of running more recursion steps. On hard Sudoku puzzles, GRAM with 10M parameters hits 97% accuracy versus 87.4% for the best prior recursive model, and with only 20 parallel samples it outperforms every deterministic baseline even at 320 recursion steps. On tasks with many valid answers like N-Queens, deterministic recursive models collapse as the number of solutions grows, while GRAM maintains near-perfect accuracy throughout. The same stochastic framework also acts as a generator: given a blank board, GRAM produces valid Sudoku puzzles 99% of the time using 16 steps, versus 1,000 steps and 55M parameters for the best diffusion baseline at just 91%. --- Paper Link – arxiv. org/abs/2605.19376v1

rohanpaul_ai's tweet photo. A 10 million parameter model just outperformed deterministic rivals 3 times its size by doing something regular recursive AI dont do: exploring multiple reasoning paths at the same time.

Most AI reasoning models are trapped on a single train of thought, and GRAM ("Generative Recursive Reasoning") is the first to break that by letting the model think in parallel universes simultaneously.

The problem is that all existing recursive models are fully deterministic, meaning given the same input they always follow the exact same reasoning path and can never escape a wrong trajectory or discover more than 1 valid answer.

GRAM fixes this by injecting learned randomness at each refinement step, so the model samples a slightly different direction each time rather than snapping to 1 fixed next state, which produces a spread of diverse reasoning trajectories.

At test time the model runs many of these paths in parallel and selects the best one using a small reward predictor trained alongside the main model, adding a "width" scaling axis on top of the usual "depth" axis of running more recursion steps.

On hard Sudoku puzzles, GRAM with 10M parameters hits 97% accuracy versus 87.4% for the best prior recursive model, and with only 20 parallel samples it outperforms every deterministic baseline even at 320 recursion steps.

On tasks with many valid answers like N-Queens, deterministic recursive models collapse as the number of solutions grows, while GRAM maintains near-perfect accuracy throughout.

The same stochastic framework also acts as a generator: given a blank board, GRAM produces valid Sudoku puzzles 99% of the time using 16 steps, versus 1,000 steps and 55M parameters for the best diffusion baseline at just 91%.

---

Paper Link – arxiv. org/abs/2605.19376v1

296

231

16K

xasima retweeted

Stanford NLP Group

@stanfordnlp

about 1 month ago

Lots of @stanfordnlp work at @icmlconf. See you in Seoul! 🇰🇷 Contextualized Privacy Defense for LLM Agents Yule Wen, @StevenyzZhang, …, @Diyi_Yang You gave your AI agent access to your email—it’s much more useful then. But how to maintain your privacy? https://t.co/4ZGY2idfq8

stanfordnlp's tweet photo. Lots of @stanfordnlp work at @icmlconf. See you in Seoul! 🇰🇷

Contextualized Privacy Defense for LLM Agents
Yule Wen, @StevenyzZhang, …, @Diyi_Yang

You gave your AI agent access to your email—it’s much more useful then. But how to maintain your privacy?

https://t.co/4ZGY2idfq8 https://t.co/bTYG5BqEZk

Who to follow

highmindedlowlife

@NathanS64855891

Human variance as a public good.

xasima retweeted

about 1 month ago

A 30B model just hit gold-medal scores at the world's hardest math contest. Olympiad math and physics are the hardest reasoning tests around. Most gold-medal scores come from massive specialized systems built for one subject. Shanghai AI Lab just released SU-01, a 30B open reasoning model. It clears the medalist cutoff on the latest IMO, USAMO, and IPhO. The recipe is the interesting part. They turn a general backbone into a proof solver in three stages: 1. Curriculum fine-tuning on 338K traces 2. 200 steps of two-stage reinforcement 3. Generate, verify, and revise at inference No code execution, no symbolic solvers, no external tools involved. The model sustains over 100K tokens of natural-language proof per run. Quality came from the training and inference loop, not from scaling parameters. The weights and recipe are public. What does this unlock for smaller open-source labs working on scientific reasoning?

AlphaSignalAI's tweet photo. A 30B model just hit gold-medal scores at the world's hardest math contest.

Olympiad math and physics are the hardest reasoning tests around.

Most gold-medal scores come from massive specialized systems built for one subject.

Shanghai AI Lab just released SU-01, a 30B open reasoning model.

It clears the medalist cutoff on the latest IMO, USAMO, and IPhO.

The recipe is the interesting part.

They turn a general backbone into a proof solver in three stages:

1. Curriculum fine-tuning on 338K traces
2. 200 steps of two-stage reinforcement
3. Generate, verify, and revise at inference

No code execution, no symbolic solvers, no external tools involved.

The model sustains over 100K tokens of natural-language proof per run.

Quality came from the training and inference loop, not from scaling parameters.

The weights and recipe are public.

What does this unlock for smaller open-source labs working on scientific reasoning?

xasima retweeted

Kursakov

@DenisKursakov

4 months ago

Prediction markets don't lie - they're systematically biased and that bias has geometry we built a calibration surface C(K, τ) - a 2D error map across strike K and horizon τ turns out: markets overprice extremes. underprice short horizons. drift optimistic on long ones this isn't noise - it's structure C(K, τ) = C_K(K) + C_τ(τ) + C_int smile * temporal drift * interaction three layers of systematic error you can measure, decompose, and correct when MCI(τ) < 0.80 - the market loses price discovery it's not about whether markets "believe" something - it's about whether they know what they don't know

DenisKursakov's tweet photo. Prediction markets don't lie - they're systematically biased

and that bias has geometry

we built a calibration surface C(K, τ) - a 2D error map across strike K and horizon τ

turns out: markets overprice extremes. underprice short horizons. drift optimistic on long ones

this isn't noise - it's structure

C(K, τ) = C_K(K) + C_τ(τ) + C_int

smile * temporal drift * interaction

three layers of systematic error you can measure, decompose, and correct

when MCI(τ) < 0.80 - the market loses price discovery

it's not about whether markets "believe" something - it's about whether they know what they don't know

346

449

35K

xasima retweeted

Robert Youssef

@rryssf

4 months ago

Google DeepMind just used AlphaEvolve to breed entirely new game-theory algorithms that outperform ones humans spent years designing the discovered algorithms use mechanisms so non-intuitive that no human researcher would have tried them. here's what actually happened and why it matters:

rryssf's tweet photo. Google DeepMind just used AlphaEvolve to breed entirely new game-theory algorithms that outperform ones humans spent years designing

the discovered algorithms use mechanisms so non-intuitive that no human researcher would have tried them.

here's what actually happened and why it matters:

658

104

611

44K

xasima retweeted

Andrej Karpathy

@karpathy

4 months ago

Congrats on the launch @simile_ai ! (and I am excited to be involved as a small angel.) Simile is working on a really interesting, imo under-explored dimension of LLMs. Usually, the LLMs you talk to have a single, specific, crafted personality. But in principle, the native, primordial form of a pretrained LLM is that it is a simulation engine trained over the text of a highly diverse population of people on the internet. Why not lean into that statistical power: Why simulate one "person" when you could try to simulate a population? How do you build such a simulator? How do you manage its entropy? How faithful is it? How can it be useful? What emergent properties might arise of similes in loops? Imo these are very interesting, promising and under-explored topics and the team here is great. All the best!

385

569

976K

xasima retweeted

Jackson Atkins

@JacksonAtkinsX

9 months ago

Microsoft and Georgia Tech gave existing models the ability to decide how to think. The model brainstorms in latent space and only writes its thoughts when confident. It makes them up to 6.78x more efficient. Are we entering the age of latent reasoning? Here's how it works: - Monitor Confidence: The system, SwiReasoning, tracks the LLM's predictive entropy. High entropy means low confidence in the next step. - Think Silently: When confidence is low, the AI switches to latent reasoning, exploring concepts as continuous soft embeddings instead of generating tokens. - Write Aloud: Once confidence rises, it switches back to explicit Chain-of-Thought, generating tokens to lock in its logical path. - Prevent Overthinking: A switch counter caps the number of cycles, forcing a conclusion to maximize token efficiency. Result: A peak token efficiency gain of 6.78x over standard CoT and an average 56-79% efficiency gain on models like Qwen3-8B.

JacksonAtkinsX's tweet photo. Microsoft and Georgia Tech gave existing models the ability to decide how to think.

The model brainstorms in latent space and only writes its thoughts when confident.

It makes them up to 6.78x more efficient.

Are we entering the age of latent reasoning?

Here's how it works:

- Monitor Confidence: The system, SwiReasoning, tracks the LLM's predictive entropy. High entropy means low confidence in the next step.

- Think Silently: When confidence is low, the AI switches to latent reasoning, exploring concepts as continuous soft embeddings instead of generating tokens.

- Write Aloud: Once confidence rises, it switches back to explicit Chain-of-Thought, generating tokens to lock in its logical path.

- Prevent Overthinking: A switch counter caps the number of cycles, forcing a conclusion to maximize token efficiency.

Result: A peak token efficiency gain of 6.78x over standard CoT and an average 56-79% efficiency gain on models like Qwen3-8B.

303

254

21K

xasima retweeted

Rohan Paul

@rohanpaul_ai

11 months ago

Most web agents still click around blindly because they never store real knowledge about page parts or user goals. This work builds Web‑CogReasoner, an agent that learns in 3 clear rounds, memorize facts, grasp concepts, then practice procedures, and thinks through that stack before it moves. The team first scraped 14 popular sites and shaped 12 tasks, giving 81K fact samples, 62K concept samples, and 27K procedure samples. These pieces land in Web‑CogDataset, a curriculum that grows from naming a button to finishing a noisy multi step booking. Training starts on Qwen2.5‑VL‑7B, adds factual labels, then concept summaries, then full action trails, each stage widening the context window to 8K tokens. During inference the model writes a little diary that lists what it sees, what it means, and which click comes next, so every move is traceable. A new exam called Web‑CogBench checks memory, understanding, and exploration. Web‑CogReasoner scores 84% overall, beating Gemini 2.5 Pro by 4 points and the strongest open source rival by 12 points. On live WebVoyager tasks it finishes 30% of jobs, up from 26% for OpenWebVoyager‑Max, and halves the gap with closed models. An ablation study proves the Bloom style ladder matters, since adding facts lifts memory by 17 points, concepts lift understanding by 11 points, and procedures lift exploration by 7 points. Overall this structured crash course in what, why, and how turns vague guesswork into reliable web navigation. ---- Paper – arxiv. org/abs/2508.01858 Paper Title: "Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents"

rohanpaul_ai's tweet photo. Most web agents still click around blindly because they never store real knowledge about page parts or user goals.

This work builds Web‑CogReasoner, an agent that learns in 3 clear rounds, memorize facts, grasp concepts, then practice procedures, and thinks through that stack before it moves.

The team first scraped 14 popular sites and shaped 12 tasks, giving 81K fact samples, 62K concept samples, and 27K procedure samples.

These pieces land in Web‑CogDataset, a curriculum that grows from naming a button to finishing a noisy multi step booking.

Training starts on Qwen2.5‑VL‑7B, adds factual labels, then concept summaries, then full action trails, each stage widening the context window to 8K tokens.

During inference the model writes a little diary that lists what it sees, what it means, and which click comes next, so every move is traceable.

A new exam called Web‑CogBench checks memory, understanding, and exploration.

Web‑CogReasoner scores 84% overall, beating Gemini 2.5 Pro by 4 points and the strongest open source rival by 12 points.

On live WebVoyager tasks it finishes 30% of jobs, up from 26% for OpenWebVoyager‑Max, and halves the gap with closed models.

An ablation study proves the Bloom style ladder matters, since adding facts lifts memory by 17 points, concepts lift understanding by 11 points, and procedures lift exploration by 7 points.

Overall this structured crash course in what, why, and how turns vague guesswork into reliable web navigation.

----

Paper – arxiv. org/abs/2508.01858

Paper Title: "Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents"

166

170

14K

xasima retweeted

Yann LeCun

@ylecun

about 1 year ago

@paulosalem Human reasoning is not based on auto-regressive discrete symbol (token) prediction. It is based on the manipulation of mental models in continuous representations spaces. It is based on *searching* for a set of manipulations of this model to arrive at a particular result.

181

14K

xasima retweeted

Chubby♨️

@kimmonismus

about 1 year ago

SEAL: LLM That Writes Its Own Updates Solves 72.5% of ARC-AGI Tasks—Up from 0% This is a breakthrough that is rarely seen and could open up undreamt-of possibilities. In the following, I will go into more detail and summarize this breakthrough:

kimmonismus's tweet photo. SEAL: LLM That Writes Its Own Updates Solves 72.5% of ARC-AGI Tasks—Up from 0%

This is a breakthrough that is rarely seen and could open up undreamt-of possibilities. In the following, I will go into more detail and summarize this breakthrough: https://t.co/bciATZzygF

388

455K

xasima retweeted

Rohan Paul

@rohanpaul_ai

about 1 year ago

This paper analyzes advanced reasoning models' performance by examining their internal steps as reasoning graphs. Methods 🔧: → Extract reasoning graphs by clustering hidden state representations for each reasoning step. → Measure graph properties like cyclicity, diameter, and small-world index. → Compare these properties between reasoning models and base models on mathematical reasoning tasks. → Analyze correlations between graph properties, task difficulty, model size, and supervised fine-tuning performance. 📌 Increased graph diameter reveals wider state exploration drives reasoning performance. 📌 Higher small-world index shows efficient local clustering and global reach in reasoning steps. 📌 Graph properties offer objective metrics for evaluating supervised fine-tuning dataset quality. ---------------------------- Paper - arxiv. org/abs/2506.05744v1 Paper Title: "Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties"

rohanpaul_ai's tweet photo. This paper analyzes advanced reasoning models' performance by examining their internal steps as reasoning graphs.

Methods 🔧:

→ Extract reasoning graphs by clustering hidden state representations for each reasoning step.

→ Measure graph properties like cyclicity, diameter, and small-world index.

→ Compare these properties between reasoning models and base models on mathematical reasoning tasks.

→ Analyze correlations between graph properties, task difficulty, model size, and supervised fine-tuning performance.

📌 Increased graph diameter reveals wider state exploration drives reasoning performance.

📌 Higher small-world index shows efficient local clustering and global reach in reasoning steps.

📌 Graph properties offer objective metrics for evaluating supervised fine-tuning dataset quality.

----------------------------

Paper - arxiv. org/abs/2506.05744v1

Paper Title: "Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties"

128

xasima retweeted

ℏεsam

@Hesamation

about 1 year ago

holy shit 😳 some scientists literally predicted the next Pope with math! these researchers laid out Vatican’s hidden network: - the connections - influences - and indirect relationships they actually predicted Robert Prevost as a leading candidate before his election.

Hesamation's tweet photo. holy shit 😳

some scientists literally predicted the next Pope with math!

these researchers laid out Vatican’s hidden network:

- the connections
- influences
- and indirect relationships

they actually predicted Robert Prevost as a leading candidate before his election. https://t.co/HzDvRBcSox

264

116

12K

xasima retweeted

Rohan Paul

@rohanpaul_ai

about 1 year ago

Super nice paper from Bytedance. With this, AI learns to code better by first learning how to pick its own high-quality training data. Releases Seed-Coder: Let the Code Model Curate Data for Itself Current LLMs for code pretraining heavily rely on manual data curation, which is unscalable and biased. Seed-Coder introduces LLMs to automate this data curation. Its model-centric pipeline scores and filters code, achieving 36.2% on MHPP with minimal human input. 📌 LLMs curating data for other LLMs cuts human bias, boosting scalability. 📌 Model-driven pipelines evaluate code quality more subtly than fixed rules. 📌 Tailored instruct and reasoning models excel at specific coding tasks. Methods Explored in this Paper 🔧: → A model-centric data pipeline leverages LLMs for scoring and filtering diverse code data sources. → The instruct model undergoes supervised fine-tuning and direct preference optimization for better instruction following. → The reasoning model employs Long-Chain-of-Thought reinforcement learning to improve multi-step code reasoning.

rohanpaul_ai's tweet photo. Super nice paper from Bytedance.

With this, AI learns to code better by first learning how to pick its own high-quality training data.

Releases Seed-Coder: Let the Code Model Curate Data for Itself

Current LLMs for code pretraining heavily rely on manual data curation, which is unscalable and biased.

Seed-Coder introduces LLMs to automate this data curation. Its model-centric pipeline scores and filters code, achieving 36.2% on MHPP with minimal human input.

📌 LLMs curating data for other LLMs cuts human bias, boosting scalability.

📌 Model-driven pipelines evaluate code quality more subtly than fixed rules.

📌 Tailored instruct and reasoning models excel at specific coding tasks.

Methods Explored in this Paper 🔧:

→ A model-centric data pipeline leverages LLMs for scoring and filtering diverse code data sources.

→ The instruct model undergoes supervised fine-tuning and direct preference optimization for better instruction following.

→ The reasoning model employs Long-Chain-of-Thought reinforcement learning to improve multi-step code reasoning.

272

203

18K

xasima retweeted

Alexander Doria

@Dorialexander

about 1 year ago

So 30 minutes of train after the flight, I could really read a new frontier LLM paper. Oh looks who’s here.

717

336

51K

xasima retweeted

Rohan Paul

@rohanpaul_ai

about 1 year ago

Existing Graph RAG (GraphRAG) methods struggle because they represent knowledge using only binary relations (linking two entities), missing complex real-world connections involving more than two entities. This paper introduces HyperGraphRAG, which uses hypergraphs to model these multi-entity (n-ary) relationships directly using hyperedges, improving knowledge representation for LLMs. HyperGraphRAG shows better accuracy, achieving higher Context Recall (e.g., 60.34 overall) and Answer Relevance (e.g., 85.15 in Medicine) than previous methods. 📌 Hypergraphs intrinsically model multi-entity facts, overcoming the information loss in binary graph representations. 📌 Dual vector retrieval (entities, hyperedges) enables precise fact finding and contextual expansion simultaneously. 📌 Capturing richer relations via hypergraphs improves accuracy, balancing slightly increased construction time and cost. ---------- Methods Explored in this Paper 🔧: → LLMs extract n-ary relational facts (hyperedges connecting multiple entities) from text to construct a knowledge hypergraph. → A bipartite graph structure stores the hypergraph efficiently in standard graph databases. → Vector embeddings represent both entities and hyperedges for semantic retrieval using similarity search. → A retrieval strategy first finds relevant entities based on the query, then expands to find connected hyperedges and related entities. → Generation combines retrieved hypergraph facts with traditional chunk-based retrieved text for a comprehensive final answer. ---------------------------- Paper - arxiv. org/abs/2503.21322 Paper Title: "HyperGraphRAG: RAG with Hypergraph-Structured Knowledge Representation"

rohanpaul_ai's tweet photo. Existing Graph RAG (GraphRAG) methods struggle because they represent knowledge using only binary relations (linking two entities), missing complex real-world connections involving more than two entities.

This paper introduces HyperGraphRAG, which uses hypergraphs to model these multi-entity (n-ary) relationships directly using hyperedges, improving knowledge representation for LLMs.

HyperGraphRAG shows better accuracy, achieving higher Context Recall (e.g., 60.34 overall) and Answer Relevance (e.g., 85.15 in Medicine) than previous methods.

📌 Hypergraphs intrinsically model multi-entity facts, overcoming the information loss in binary graph representations.

📌 Dual vector retrieval (entities, hyperedges) enables precise fact finding and contextual expansion simultaneously.

📌 Capturing richer relations via hypergraphs improves accuracy, balancing slightly increased construction time and cost.

----------

Methods Explored in this Paper 🔧:

→ LLMs extract n-ary relational facts (hyperedges connecting multiple entities) from text to construct a knowledge hypergraph.

→ A bipartite graph structure stores the hypergraph efficiently in standard graph databases.

→ Vector embeddings represent both entities and hyperedges for semantic retrieval using similarity search.

→ A retrieval strategy first finds relevant entities based on the query, then expands to find connected hyperedges and related entities.

→ Generation combines retrieved hypergraph facts with traditional chunk-based retrieved text for a comprehensive final answer.

----------------------------

Paper - arxiv. org/abs/2503.21322

Paper Title: "HyperGraphRAG: RAG with Hypergraph-Structured Knowledge Representation"

376

331

19K

xasima retweeted

Rohan Paul

@rohanpaul_ai

about 1 year ago

How LLMs acquire factual knowledge during training remains unclear. This paper investigates these learning dynamics using synthetic biographies, revealing a three-phase process where models first learn statistics, plateau while forming attention circuits, and finally acquire specific facts. The study proposes data scheduling, like initially using imbalanced data, can accelerate learning by shortening the plateau. 📌 Plateau phase builds essential recall circuits; data scheduling optimizes this pre-knowledge acquisition stage. 📌 Tailoring data distribution (imbalanced/uniform) to distinct learning phases strategically accelerates overall knowledge acquisition. 📌 Fine-tuning struggles because hallucinations emerge with knowledge, rapidly corrupting feed-forward associative memories. ---------- Methods Explored in this Paper 🔧: → Attention patching experiments, swapping attention patterns between models, demonstrate that recall circuits develop specifically during the performance plateau phase. → Fine-tuning struggles to add new knowledge because hallucinations emerge concurrently with learning, quickly degrading existing memories stored in feed-forward layers. → Imbalanced data distributions shorten the plateau phase, while uniform distributions are optimal for the later knowledge acquisition speed, presenting a trade-off. ---------------------------- Paper - arxiv. org/abs/2503.21676 Paper Title: "How do language models learn facts? Dynamics, curricula and hallucinations"

rohanpaul_ai's tweet photo. How LLMs acquire factual knowledge during training remains unclear.

This paper investigates these learning dynamics using synthetic biographies, revealing a three-phase process where models first learn statistics, plateau while forming attention circuits, and finally acquire specific facts.

The study proposes data scheduling, like initially using imbalanced data, can accelerate learning by shortening the plateau.

📌 Plateau phase builds essential recall circuits; data scheduling optimizes this pre-knowledge acquisition stage.

📌 Tailoring data distribution (imbalanced/uniform) to distinct learning phases strategically accelerates overall knowledge acquisition.

📌 Fine-tuning struggles because hallucinations emerge with knowledge, rapidly corrupting feed-forward associative memories.

----------

Methods Explored in this Paper 🔧:

→ Attention patching experiments, swapping attention patterns between models, demonstrate that recall circuits develop specifically during the performance plateau phase.

→ Fine-tuning struggles to add new knowledge because hallucinations emerge concurrently with learning, quickly degrading existing memories stored in feed-forward layers.

→ Imbalanced data distributions shorten the plateau phase, while uniform distributions are optimal for the later knowledge acquisition speed, presenting a trade-off.

----------------------------

Paper - arxiv. org/abs/2503.21676

Paper Title: "How do language models learn facts? Dynamics, curricula and hallucinations"

480

417

139K

xasima retweeted

Rohan Paul

@rohanpaul_ai

over 1 year ago

Brain signals and LLM embeddings converge for predicting every spoken or heard word. Beautiful research from @GoogleAI They compared human brain activity during real conversations with internal embeddings from a speech-to-text LLM. Measured electrode signals in speech and language-related brain regions and matched them to the model’s word-level features. 🤖Key Highlights → Brain activity aligns linearly with LLM embeddings for real-life spoken conversations. → Sequence of comprehension: first speech sounds, then word meaning. → Sequence of production: planned meaning, then articulation, then hearing one’s own voice. → Consistent predictive coding (pre-onset anticipation, post-onset surprise) mirrors LLM next-word prediction. → Lower-tier auditory regions still show partial sensitivity to semantic information. 🤖 Model-Brain Alignment They observed a clear sequence: during comprehension, auditory cortex (superior temporal gyrus) showed strong correlation with speech embeddings, then language embeddings aligned with Broca’s area. During production, Broca’s area correlated with language embeddings before articulation, followed by motor cortex signals matching speech embeddings. This suggests that next-word prediction and higher-level meaning representation in the model parallel the brain’s approach. ⚙ So the study revealed a shared computational principle of predicting words in context. Even though the Transformer-based LLM processes words in parallel layers, the human brain processes them serially yet mirrors similar statistical regularities. This supports a “soft hierarchy” where both lower-level acoustic processing and higher-level semantic processing partially overlap in the brain.

150

107

15K

xasima retweeted

Gappy (Giuseppe Paleologo)

@__paleologo

over 1 year ago

The Big Ideas (trivial or not, depends on you) are: 1. orthogonalize your alphas; 2. model alpha uncertainty. The first one gives you factor neutrality and separability of the optimization problem (=>closed-form solutions). The second one gives you more effective sizing. 5/

Alexander Perez

@xasima

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users