Data Analytics Lab @datalabbe - Twitter Profile

about 2 months ago

We adopted Claude as a full coworker and we would like to meet other Claude Code users in Brussels. So, we're organising a Claude Code Meetup! Check out the event: https://t.co/WQWUozJHhA Registration mandatory & limited seats

0

2

3

0

83

Data Analytics Lab @DataLabBE

about 2 months ago

We adopted Claude as a full coworker and we would like to meet other Claude Code users in Brussels. So, we're organising a Claude Code Meetup! Check out the event: https://t.co/WQWUozJHhA Registration mandatory & limited seats

0

2

3

0

83

DataLabBE retweeted

Ethan Mollick

@emollick

3 months ago

Many benchmarks use LLMs as a judge of correctness, typically a smaller, cheaper model. This paper shows weaker judges are not able to evaluate smarter models. A benchmark is really a triplet of dataset, model, judge & judges are increasingly the bottleneck being saturated.

emollick's tweet photo. Many benchmarks use LLMs as a judge of correctness, typically a smaller, cheaper model. This paper shows weaker judges are not able to evaluate smarter models. A benchmark is really a triplet of dataset, model, judge & judges are increasingly the bottleneck being saturated. https://t.co/ElYtxXspw7

46

350

33

156

47K

DataLabBE retweeted

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

5 months ago

"A milestone" Mathematician Terence Tao confirms AI "more or less autonomously" solved Erdos Problem #728. It was unsolved for 50 YEARS. "This is a demonstration of the genuine increase in capability of these tools in recent months"

AISafetyMemes's tweet photo. "A milestone"

Mathematician Terence Tao confirms AI "more or less autonomously" solved Erdos Problem #728.

It was unsolved for 50 YEARS.

"This is a demonstration of the genuine increase in capability of these tools in recent months" https://t.co/g7ynAaOMM5

18

130

13

26

44K

Who to follow

BruBotics VUB

@brubotics

Brussels Human Robotic research center of @VUBrussel | Taking the next step in human robotics by breaking boundaries | Host @BramVDBorght

Ann Dooms

@AnnDooms

Prof. @VUBrussel • From Math to Machines & Back • Writer @tijd @eos_wetenschap & Books • EMS Education • Scientific Council @BelgiumDefence @BELSPO @imec_int

emilio

@emilio_gamba

Data Analytics Lab @DataLabBE

6 months ago

@theshamdas is an Aerospace Engineer turned Data Scientist who is pursuing an Industrial PhD under Prof. Sam Verboven in collaboration with AGC Automotive Europe. His research focuses on deploying Causal Inference for Price Elasticity estimation and Price optimization.

DataLabBE's tweet photo. @theshamdas is an Aerospace Engineer turned Data Scientist who is pursuing an Industrial PhD under Prof. Sam Verboven in collaboration with AGC Automotive Europe. His research focuses on deploying Causal Inference for Price Elasticity estimation and Price optimization. https://t.co/x7P1Zlb43Z

0

47

Data Analytics Lab @DataLabBE

6 months ago

With the end of 2025 fast approaching, we want to introduce the new team members who have joined us during the past year. Check out their profiles and collages to learn more about them, both professionally and outside academia. A thread 🧵

1

5

4

0

177

Data Analytics Lab @DataLabBE

6 months ago

Welcome Luc Hirsch, our new TA and PhD candidate in Causal Machine Learning under the supervision of Prof. Sam Verboven. Luc joins us with a strong background in Applied Mathematics from ULB.

DataLabBE's tweet photo. Welcome Luc Hirsch, our new TA and PhD candidate in Causal Machine Learning under the supervision of Prof. Sam Verboven. Luc joins us with a strong background in Applied Mathematics from ULB. https://t.co/royfEdbaVj

1

0

58

Data Analytics Lab @DataLabBE

6 months ago

New paper out! People share on average ~25% of gains/losses even when it reduces expected gains, and when altruism, fairness & reputation are stripped away. Non-ergodic dynamics offer the explanation. https://t.co/97BH0enwqB

0

9

5

8

6K

DataLabBE retweeted

BRUZZ @BRUZZbe

8 months ago

Onderzoekers over metro 3: 'Amper voordelen bij halve aanleg' https://t.co/IBSbMWu4bg

1

4

3

1

932

Data Analytics Lab @DataLabBE

8 months ago

We did: Simulated current vs partial vs full Metro Line 3 network w/ GTFS data We found: Substantial but uneven gains Robustness check: Accessibility varies w/ departure timing

DataLabBE's tweet photo. We did: Simulated current vs partial vs full Metro Line 3 network w/ GTFS data

We found: Substantial but uneven gains
Robustness check: Accessibility varies w/ departure timing https://t.co/VqA27Rpezk

0

2

0

82

Data Analytics Lab @DataLabBE

8 months ago

New paper: Accessibility impacts of Brussels Metro Line 3 Brecht Verbeken @v_arne @VincentGinis We ask: Beyond costs & delays, who actually benefits if it’s built? Paper: https://t.co/UyIHyWqFAp

1

0

125

DataLabBE retweeted

Rohan Paul

@rohanpaul_ai

9 months ago

The paper shows that simple words in chain of thought text can reliably flag wrong LLM answers. When the model’s reasoning text (the chain of thought) includes words like “guess” or “stuck”, the chance that the final answer is correct goes down a lot, by up to 40%. So put simply: if the model writes “I guess the answer is …�� or shows signs of being stuck, then the probability it is wrong is much higher. This makes those words strong warning signals that the answer is unreliable. The study covers 2 models across a hard general exam and a big math set, tracking chain length, tone swings, and uncertainty words. Length helps only on the math set, longer chains tend to go wrong there, and it says nothing on the hard exam. Sentiment movement inside the chain is a weaker signal, a small upward mood links with better math answers, and it is unhelpful on the hard exam. Words do the heavy lifting, terms like guess, stuck, hard, likely, and possibly show low confidence and track mistakes. A compact 25 word list predicts correctness better than the model's own confidence, and even a top 5 word rule competes well. The takeaway is practical, scan the chain for these flags and route or double check risky outputs without extra compute or weight access. ---- Paper – arxiv. org/abs/2508.15842 Paper Title: "Lexical Hints of Accuracy in LLM Reasoning Chains"

rohanpaul_ai's tweet photo. The paper shows that simple words in chain of thought text can reliably flag wrong LLM answers.

When the model’s reasoning text (the chain of thought) includes words like “guess” or “stuck”, the chance that the final answer is correct goes down a lot, by up to 40%.

So put simply: if the model writes “I guess the answer is …�� or shows signs of being stuck, then the probability it is wrong is much higher. This makes those words strong warning signals that the answer is unreliable.

The study covers 2 models across a hard general exam and a big math set, tracking chain length, tone swings, and uncertainty words.

Length helps only on the math set, longer chains tend to go wrong there, and it says nothing on the hard exam.

Sentiment movement inside the chain is a weaker signal, a small upward mood links with better math answers, and it is unhelpful on the hard exam.

Words do the heavy lifting, terms like guess, stuck, hard, likely, and possibly show low confidence and track mistakes.

A compact 25 word list predicts correctness better than the model's own confidence, and even a top 5 word rule competes well.

The takeaway is practical, scan the chain for these flags and route or double check risky outputs without extra compute or weight access.

----

Paper – arxiv. org/abs/2508.15842

Paper Title: "Lexical Hints of Accuracy in LLM Reasoning Chains"

3

20

7

11

2K

Data Analytics Lab @DataLabBE

9 months ago

New paper: Lexical Hints of Accuracy in LLM Reasoning Chains We ask: Can words in LLM's reasoning trace tell us when it’s wrong? - CoT length predicts accuracy on easier tasks - Lexical cues (guess, stuck, hard) predict errors regardless of task difficulty https://t.co/XRvUfgzMVD

0

2

0

138

Data Analytics Lab

@DataLabBE

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users