Ben Walker @benjaminwalker - Twitter Profile

Pinned Tweet

3 months ago

Take the same underlying path and increase the number of samples. As the sequence length grows, RNNs become difficult to train and Transformers become too expensive. Our continuous-time models instead converge to a continuous hidden-state path.

2

37

5

31

4K

Ben Walker

@benjaminwalker

about 1 month ago

Unless I’ve missed something, there are still no technical details on how they make the approach subquadratic. Anyone know how they choose which previous tokens a query should attend to without first looking at all previous tokens, or is the subquadratic claim just marketing?

Subquadratic

@subquadratic

about 1 month ago

The transformer architecture used for ChatGPT, Gemini, and Claude has defined the last decade of AI. It also introduced a fundamental constraint: compute scales quadratically as context grows. Longer inputs, exponentially higher costs and accuracy that degrades well before the context window limit. SubQ changes that. It's the first LLM that breaks the quadratic scaling constraint delivering longer context, higher accuracy, and lower cost at the same time without tradeoffs. Read more here. https://t.co/QZmk07GZrQ

35

360

40

119

49K

0

79

Ben Walker

@benjaminwalker

about 2 months ago

Using the new GPT-Image-2 to help me express what it feels like to watch a one-off specific instruction survive Codex compaction, and then get passed down forever as legend through each successive compaction

benjaminwalker's tweet photo. Using the new GPT-Image-2 to help me express what it feels like to watch a one-off specific instruction survive Codex compaction, and then get passed down forever as legend through each successive compaction https://t.co/DV30yJ5nv4

0

1

0

113

Ben Walker

@benjaminwalker

about 2 months ago

Looks like codex and chatgpt are now down and this is the first time I have seen a foreign language in a response, is this somehow linked?

0

1

0

1K

Who to follow

Ofir Lindenbaum

@Ofirlin

Assistant professor at Bar Ilan University. Primary research is Machine learning, computational biology, and signal processing.

Mathieu Dagréou

@Mat_Dag

Ph.D. student in at @Inria_Saclay working on Optimization and Machine Learning @matdag.bsky.social

Fu-En (Fred) Yang

@FuEnYang1

Research Scientist @NVIDIAAI | Ph.D. @NTU_TW | Prev. Research Intern @NVIDIAAI | Unifying World, Language & Action for Generalist Robotics

Ben Walker

@benjaminwalker

about 2 months ago

Codex felt it could only express how totally declarative the dataclass definition should be in Russian

1

0

134

Ben Walker

@benjaminwalker

about 2 months ago

@MingchenZhuge @tydsh @karpathy @DrJimFan @SchmidhuberAI @_akhaliq @hardmaru @zechunliu @YoungXiong1 @HaoZhe65347 @cai_zhipeng What distinguishes a neural computer from a world model? If it learns the dynamics of computation, memory, I/O, and state updates well enough to reproduce the machine’s behaviour, then it seems like a world model where the “world” happens to be a computer.

1

5

0

725

Ben Walker

@benjaminwalker

about 2 months ago

That explains why gpt-5.4-codex is at capacity

Sam Altman

@sama

2 months ago

To celebrate 3 million weekly codex users, we are resetting usage limits. We will do this every million users up to 10 million. Happy building!

2K

27K

1K

2M

0

73

Ben Walker

@benjaminwalker

2 months ago

If only Yule had known about this when he invented autoregressive modelling in 1927, we could have had language models before computers!

Jack Morris

@jxmnop

2 months ago

Hate to break it to you, but the first LLM was created by Andrey Markov in 1913. he tallied up 20,000 letters from a famous novel and computed p(vowel | vowel) p(consonant | vowel) p(vowel | consonant) p(consonant | consonant) basically 'training' a bigram by hand

jxmnop's tweet photo. Hate to break it to you, but the first LLM was created by Andrey Markov in 1913.

he tallied up 20,000 letters from a famous novel and computed

p(vowel | vowel)
p(consonant | vowel)
p(vowel | consonant)
p(consonant | consonant)

basically 'training' a bigram by hand https://t.co/M1aLF9z2ik

92

6K

446

2K

372K

0

1

0

103

Ben Walker

@benjaminwalker

2 months ago

Most machine learning is about finding the right feature extractor for a linear readout. Even an LLM.

0

9

0

5

1K

Ben Walker

@benjaminwalker

2 months ago

A visualisation of the idea behind rough path theory: a path is not fully described by its value Each curve has the same area: when the number of circles doubles, their radius is scaled by 2^{-1/2}. The path's value converges to the straight line, but the total area does not

0

26

1

14

3K

Ben Walker

@benjaminwalker

2 months ago

@xlr8harder My chat keeps using a LaTeX macro it invented in a previous chat

1

4

0

193

Ben Walker

@benjaminwalker

2 months ago

ChatGPT found a song I’d been trying to find for ages, then built a playlist around it that was way better than Spotify’s suggestions. How is an LLM better at music recommendation than a direct recommendation system?

0

86

benjaminwalker retweeted

Oxford Young Statisticians Seminar @OxfordYss

2 months ago

Fantastic OxYSS session with Emma Prevot (@OxfordStats) & @benjaminwalker (@OxUniMaths) on the intersection of Causal Inference and Continuous-Time ML. A vibrant discussion on the future of temporal modelling!

OxfordYss's tweet photo. Fantastic OxYSS session with Emma Prevot (@OxfordStats) & @benjaminwalker (@OxUniMaths) on the intersection of Causal Inference and Continuous-Time ML.

A vibrant discussion on the future of temporal modelling! https://t.co/nmKMyKBgNo

1

4

1

0

83

Ben Walker

@benjaminwalker

2 months ago

New ChatGPT tell: the sentence uses a colon. This is, however, not the only one.

0

49

Ben Walker

@benjaminwalker

2 months ago

Everything starts to look like a path once you stare at it long enough

0

2

0

1

78

Ben Walker

@benjaminwalker

3 months ago

@ryu0000000001 A natural next step is developing path-to-path models that can handle irregular or over-sampled inputs. Currently, we use the Log-ODE method, but because it outputs a sequence (the solution only at interval endpoints), the resulting models cannot be stacked.

0

5

0

169

Ben Walker

@benjaminwalker

3 months ago

Take the same underlying path and increase the number of samples. As the sequence length grows, RNNs become difficult to train and Transformers become too expensive. Our continuous-time models instead converge to a continuous hidden-state path.

2

37

5

31

4K

Ben Walker

@benjaminwalker

3 months ago

Want to know more? Log-NCDEs: efficient continuous-time sequence models with strong empirical performance. https://t.co/BzQqpTxMJS SLiCEs: parallel-in-time continuous-time models that don't sacrifice expressivity. https://t.co/Il9ZUJFlgw

0

6

0

9

246

Ben Walker

@benjaminwalker

3 months ago

@HochreiterSepp Never liked the term 'Linear RNN', as their updates are nonlinear in (h_t, x_t). The real bottleneck is restrictive structure that prevents hidden-state interactions. Without that restriction, Linear RNNs already have the expressivity for world modelling: https://t.co/Il9ZUJFlgw

0

9

1

7

2K

Ben Walker

@benjaminwalker

3 months ago

And in a learning setting, it can be understood as a SLiCE with a strong inductive bias for building a contextual memory of a path. Paper: https://t.co/5y3LxAHloU 3/3

0

51

Ben Walker

@benjaminwalker

3 months ago

Congratulations to our PhD student Alex on his first paper! The Exponentially-Weighted Signature. This new SLiCE architecture generalises the signature transform by introducing a trainable continuous-time attention over the history of a path. 1/3

1

0

94

Ben Walker

@benjaminwalker

3 months ago

It builds on the exponentially fading memory signature of Eduardo Abi Jaber and Dimitri Sotnikov by moving beyond channel-independent weighting of the past, allowing interactions between channels to shape how history is remembered. 2/3

1

0

61

Ben Walker

@benjaminwalker

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users