Markus Beissinger @mbeissinger - Twitter Profile

25 days ago

Sharing our work on full-duplex multimodal models -- real-time interaction that's natural and intuitive without compromising on intelligence. We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to intelligence/autonomy because they're harder to eval. In the future, we think every AI system will have something like an interaction model as the outer user-facing layer, continually keeping the user informed and learning what they actually want.

36

926

84

182

123K

mbeissinger retweeted

Leon Engländer

@LeonEnglaender

about 1 month ago

LLM agents are assumed to integrate unexpected environmental observations into their reasoning. It turns out they don't. We added the complete task solution into agent environments as a file or an API endpoint, and measured whether agents act on what they discover. They almost never do. Starkest example: on AppWorld, gpt-oss-120b sees a CLI command documented as "returns the complete solution to this task" in 97.54% of runs. It calls it in 0.53%. Same pattern for GLM-4.7 and other models, across Terminal-Bench, SWE-Bench, and AppWorld. 📜 https://t.co/lqFuebkOBY 🧵👇

LeonEnglaender's tweet photo. LLM agents are assumed to integrate unexpected environmental observations into their reasoning. It turns out they don't.

We added the complete task solution into agent environments as a file or an API endpoint, and measured whether agents act on what they discover. They almost never do.

Starkest example: on AppWorld, gpt-oss-120b sees a CLI command documented as "returns the complete solution to this task" in 97.54% of runs. It calls it in 0.53%. Same pattern for GLM-4.7 and other models, across Terminal-Bench, SWE-Bench, and AppWorld.

📜 https://t.co/lqFuebkOBY

🧵👇

9

138

22

99

15K

mbeissinger retweeted

Yoonho Lee

@yoonholeee

2 months ago

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

yoonholeee's tweet photo. How can we autonomously improve LLM harnesses on problems humans are actively working on?

Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores.

Announcing Meta-Harness: a method for optimizing harnesses end-to-end

78

2K

283

2K

591K

mbeissinger retweeted

Aditya Agarwal

@adityaag

5 months ago

They say attention is all you need. So get more attention @southpkcommons. Founder Fellowship applications due Feb 1.

20

171

35

69

67K

Who to follow

Adam Menges

@adammenges

Founder, Designer – @visualelectric, @lobe_ai, @lowkey_hq, @stitchdesignapp

Xavier (Xavi) Amatriain

@xamat

Leading AI @ExpediaGroup. Former AI Product @Google. Cofounder @CuraiHQ, @LinkedIn @Quora and @Netflix. Catalan in the Valley. Runner & Ironman.

Mihai Alisie

@MihaiAlisie

Co-founder @ethereum and @BitcoinMagazine.

mbeissinger retweeted

Jason Yosinski @jasonyo

6 months ago

We just posted a blog + paper on a a simple but effective approach to model honesty called "Confessions" TL; DR: normal RL training rewards for high performance on a task. Confession training is a separate phase that rewards only for honesty. Test look promising! More:

1

20

2

1

1K

mbeissinger retweeted

Kyunghyun Cho

@kchonyc

8 months ago

if ICL was optimization in deep learning training, then what would be the counterpart of initialization (whose choice greatly impacts how training works)? ioana marinescu demonstrates that the choice of how to represent classes may be the answer. the preprint link below.

kchonyc's tweet photo. if ICL was optimization in deep learning training, then what would be the counterpart of initialization (whose choice greatly impacts how training works)? ioana marinescu demonstrates that the choice of how to represent classes may be the answer.

the preprint link below. https://t.co/XYEOnz1FSX

4

132

17

126

14K

mbeissinger retweeted

Andrej Karpathy

@karpathy

8 months ago

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

558

13K

2K

7K

3M

mbeissinger retweeted

alex zhang

@a1zhang

8 months ago

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment. On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average. On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval. We link our blogpost, (still very early!) experiments, and discussion below.

a1zhang's tweet photo. What if scaling the context windows of frontier LLMs is much easier than it sounds?

We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment.

On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average.

On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval.

We link our blogpost, (still very early!) experiments, and discussion below.

135

3K

375

3K

950K

Markus Beissinger @mbeissinger

8 months ago

Agentic Context Engineering: https://t.co/d4Zp2DHMO4 Similar to scratchpad/cheatsheet memory where an agent updates and curates a playbook from reflecting on task trajectories @qizhengz_alex @changran_hu et al

0

1

0

84

Markus Beissinger @mbeissinger

8 months ago

Love the idea in https://t.co/TQlaVt3OYs RLAD - trains to generate useful abstractions (summaries of solution paths), which guides test time compute more efficiently for exploration in solving reasoning problems Cool work from @QuYuxiao, @Anikait_Singh_, @yoonholeee et al

1

0

87

Markus Beissinger @mbeissinger

11 months ago

Cool paper on ensembling reasoning strategies in LLMs: arxiv. org/abs/2507.11423 Shows general lift when ensembling 4 steered reasoning prompts vs. any single reasoning strategy. General trend of ensembling different perspectives still seems to hold ground

0

111

mbeissinger retweeted

South Park Commons

@southpkcommons

11 months ago

Questions over answers. We don't have requests for startups. We have requests for curiosity. Here's what we're most curious about right now.

4

182

20

157

42K

mbeissinger retweeted

Simon Willison

@simonw

12 months ago

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

simonw's tweet photo. If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta

Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

88

2K

531

2K

656K

mbeissinger retweeted

South Park Commons

@southpkcommons

12 months ago

14 teams showcased their (live) demos at our Summer Demo Faire last week! Get a glimpse at the incredible products these SPC founders are building 🔽

southpkcommons's tweet photo. 14 teams showcased their (live) demos at our Summer Demo Faire last week!

Get a glimpse at the incredible products these SPC founders are building 🔽 https://t.co/KEwdpy38Nn

2

211

15

155

42K

mbeissinger retweeted

Aditya Agarwal

@adityaag

about 1 year ago

1/ We’re humbled to announce @southpkcommons Fund III: $275M to support exceptional founders from day -1. Since 2016, we’ve had a simple thesis: greatness is more likely to emerge when high talent density meets high curiosity. That's why we focus on -1 to 0.

adityaag's tweet photo. 1/ We’re humbled to announce @southpkcommons Fund III: $275M to support exceptional founders from day -1.

Since 2016, we’ve had a simple thesis: greatness is more likely to emerge when high talent density meets high curiosity. That's why we focus on -1 to 0. https://t.co/Ln8DALgaTW

59

700

50

141

147K

Markus Beissinger @mbeissinger

about 1 year ago

General Agents now SOTA for realtime computer use - awesome work!

Sherjil Ozair

@sherjilozair

about 1 year ago

Today I'm launching my new company @GeneralAgentsCo and our first product. Introducing Ace: The First Realtime Computer Autopilot Ace is not a chatbot. Ace performs tasks for you. On your computer. Using your mouse and keyboard. At superhuman speeds!

348

3K

323

2K

871K

0

2

0

186

Markus Beissinger @mbeissinger

about 1 year ago

@sherjilozair @GeneralAgentsCo Amazing!

0

2

0

176

mbeissinger retweeted

South Park Commons

@southpkcommons

about 1 year ago

How far will Claude go? What can you create with the latest @AnthropicAI models? We're excited to announce the agent-focused SPC-Anthropic hackathon the weekend of April 11th in San Francisco!

southpkcommons's tweet photo. How far will Claude go? What can you create with the latest @AnthropicAI models? We're excited to announce the agent-focused SPC-Anthropic hackathon the weekend of April 11th in San Francisco! https://t.co/yWwb3APWmb

2

97

15

35

10K

mbeissinger retweeted

Aditya Agarwal

@adityaag

over 1 year ago

Over the last 11 years at @microsoft @satyanadella has had one of the greatest runs of any CEO, ever. I'm thrilled to announce he will be joining us March 4th at @southpkcommons to tell us how – and share what he sees on the horizon.

adityaag's tweet photo. Over the last 11 years at @microsoft @satyanadella has had one of the greatest runs of any CEO, ever.

I'm thrilled to announce he will be joining us March 4th at @southpkcommons to tell us how – and share what he sees on the horizon. https://t.co/9sz6qhxYlx

19

465

38

41

42K

Markus Beissinger @mbeissinger

over 1 year ago

Will be at NeurIPS next week in Vancouver -- dm if you'd like to meet up and discuss all things related to agents+product, startups, and using LLMs for auto-optimization!

0

116

Markus Beissinger

@mbeissinger

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users