tom zollo

Verified account

@SquareZollo

girl/boy dad & adventures @zemelgroup. formerly barstool and music stuff.

Tarrytown, NY

Joined February 2011

686 Following

1.5K Followers

191 Posts

Pinned Tweet

about 1 month ago

There’s been lots of interest in LLM calibration over the last few years, especially recently for reasoning LLMs. But most methods still require labeled data or extra inference-time compute. Sometimes we’re not that lucky: e.g., a personalized QA assistant running on-device still needs calibrated confidence, but may not have ground-truth labels or the resources to generate lots of extra tokens. That’s exactly the challenge we address in our new paper: “Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation”

1

5

1

2

688

about 1 month ago

Excited to see more work on VLA calibration from @AvivTamarLab, I really like this formulation https://t.co/6AJwxQqNJi

0

1

0

0

74

about 1 month ago

Always a blast working with @jwang771 @zemelgroup Paper link here: https://t.co/DZuU89E3P5

0

2

0

1

92

about 1 month ago

There’s been lots of interest in LLM calibration over the last few years, especially recently for reasoning LLMs. But most methods still require labeled data or extra inference-time compute. Sometimes we’re not that lucky: e.g., a personalized QA assistant running on-device still needs calibrated confidence, but may not have ground-truth labels or the resources to generate lots of extra tokens. That’s exactly the challenge we address in our new paper: “Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation”

1

5

1

2

688

Who to follow

Verified account

@barstoolsports @rubbinisracing #dkpartner

Verified account

Took a report, now on a barstool, and I mute A LOT of people... Boxing/History/NASCAR🥋@rubbinisracing @Barstool_Boxing & @twistedhistory

Verified account

Founder, Gambly Ventures (@Gambly @GamblyBot @UnabatedSports), prev. founder @FantasyLabs (acquired). Author & Pushup King. Investor @Underdog @LayerZero_Core

about 1 month ago

We tested this pretty extensively, generating over 5B tokens across 9 reasoning models (from 600M to 14B) and 5 tasks spanning math, science and open-domain QA. Our approach substantially outperforms unsupervised baselines based on token probabilities or verbalized confidence (which itself requires extra compute). It also remains strong under distribution shift, works in a black-box setting or without generating a response, and improves downstream decision-making in selective prediction and simulated linguistic calibration.

1

2

0

0

130

2 months ago

@iaindunning I think if you have data and the task is narrow then smaller models might do the trick, eg you don’t need gpt-X to do sentiment analysis on customer reviews or intent classification on service requests. (But your customer assistant chatbot will have to be gpt-X).

1

1

0

0

155

SquareZollo retweeted

@tee_oh_double_d

3 months ago

Our new preprint on parallelizing training of temporally precise spiking neural networks is out! We show up to 44x speedups over a conventional sequential baseline. 1/N

tee_oh_double_d's tweet photo. Our new preprint on parallelizing training of temporally precise spiking neural networks is out!

We show up to 44x speedups over a conventional sequential baseline. 1/N https://t.co/PppcnTrPIq

1

16

4

6

1K

3 months ago

@jason_lee328 @allen_ai @hungchiayu123 @bqwluckyone @OpenDriveLab Paper: https://t.co/ZI1PjFutXH Github: https://t.co/ABu0IcyG9N

0

0

0

0

164

3 months ago

When I started the VLA calibration project early in 2025, OpenVLA was pretty much the only model that I could use. Since then a bunch of new token-based VLAs have come out, so we updated our paper with new experiments on 4 VLAs.

1

6

1

3

388

3 months ago

New models include MolmoAct from @jason_lee328 @allen_ai, NORA from @hungchiayu123, and UniVLA from @bqwluckyone @OpenDriveLab Updated results show that the accuracy vs. calibration relationship may be dependent on model architecture and training objective, and that our approaches to prompt ensembling, action scaling, and time-aware monitoring generalize across models.

1

0

0

0

136

3 months ago

ty for making me feel sane @ziv_ravid

Ravid Shwartz Ziv

3 months ago

https://t.co/c6J9Ka78GH

14

234

19

183

46K

0

1

0

0

206

3 months ago

@giffmana LLMs definitely forget lots of world knowledge during post training though? Deepseek and even Qwen3 are pretty bad at QA

0

0

0

0

205

SquareZollo retweeted

Zemel Group @zemelgroup

3 months ago

New work on continual learning and controllable memory from the Zgroup!

0

4

1

1

494

3 months ago

We propose a framework that combines: - LLM-based expected information gain for scoring candidate questions - Heterogeneous GNN propagation to aggregate responses and attributes - Per-round adaptive respondent selection under explicit budgets By querying a small, informative subset of individuals, the model infers population-level responses through structured similarity.

SquareZollo's tweet photo. We propose a framework that combines:
- LLM-based expected information gain for scoring candidate questions
- Heterogeneous GNN propagation to aggregate responses and attributes
- Per-round adaptive respondent selection under explicit budgets

By querying a small, informative subset of individuals, the model infers population-level responses through structured similarity.

0

0

0

0

46

3 months ago

Last year we published our work on teaching an LLM to select questions to most efficiently gather information about an individual. What's a natural follow-up? How about selecting questions and individuals to most efficiently gather information about a group!

1

4

0

2

108

3 months ago

@ding_ruomeng @zhun_deng @zemelgroup We study a new problem setting: Adaptive Group Elicitation. Under real costs and missing data, the system must dynamically decide: 👉 ❓ Which question to ask 👉 👥 Which individuals to query 👉 🌐 How to leverage population structure to infer unobserved responses

1

0

0

0

56

3 months ago

@behrouz_ali @maxsbennett @zemelgroup @__YuWang__ Thanks Ali, Titans has been a frequent topic of conversation in our group for a while, such a great paper

0

1

0

0

38

3 months ago

Super psyched to finally share our new continual learning paper “Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language”

SquareZollo's tweet photo. Super psyched to finally share our new continual learning paper “Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language” https://t.co/1FQjxU8Dzk

1

5

1

5

1K

SquareZollo retweeted

Johannes Oswald @oswaldjoh

3 months ago

Love this direction! @SquareZollo looks again like great work - congrats!

1

2

1

3

535

3 months ago

@oswaldjoh Thank you Johannes!!

0

0

0

0

30

Last Seen Users on Sotwe

Trends for you

Most Popular Users