Bayesian ML at Scale @bayesianin - Twitter Profile

@xwanyex I think Zitron's main points are around finance, which I don't know well enough to comment on, but I find his perspective interesting. This is a parallel point to are LLMs are impressive or useful.

0

25

Who to follow

Lorenzo Pacchiardi

@LPacchiardi

AI evaluations and EU AI policy Assistant Research Professor @LeverhulmeCFI @Cambridge_Uni. PhD in Stats and ML @UniofOxford.

Arnaud Doucet

@ArnaudDoucet1

Senior Staff Research Scientist @GoogleDeepMind. Previously @UniofOxford.

Judea Pearl

@yudapearl

Student of causal inference, human reasoning, and history of ideas, all viewed through the sharp lens of artificial intelligence.

Bayesian ML at Scale

@BayesianIn

3 months ago

Christopher Sims October 21, 1942 – March 14, 2026 https://t.co/BPSUpMVJPT

0

2

0

163

BayesianIn retweeted

Judea Pearl

@yudapearl

4 months ago

A friend just informed me that our colleage, Professor Arthur Dempster, has died last month at 96. Arthur was an intellectual giant, famous for developing the EM algorithm as well as for the Shafer-Dempster theory, but remained skeptic about causation. https://t.co/lqQyEKjTPM. May his memory be an inspiration.

4

39

9

5K

Bayesian ML at Scale

@BayesianIn

4 months ago

Well, everything (except probability)

Richard Sutton

@RichardSSutton

4 months ago

Yann is right about everything (except RL).

47

1K

61

398

209K

0

1

0

222

Bayesian ML at Scale

@BayesianIn

5 months ago

Thanks for the response, I will wait and see if Prof Pearl has the same interpretation as you. From an operational point of view, if M1 and M2 only differ in something that isn't observed, then I would say they don't really differ at all, not just in the likelihood but for any practical purpose. How does: "M2 - a drug that cures 10% and kills 10%." Make sense unless it's possible to differentiate the 10% who are cured and the 10% who are killed? Otherwise you would just say it does nothing (like M1).

1

0

66

Bayesian ML at Scale

@BayesianIn

5 months ago

@yudapearl Have a relaxing break and I hope they are friendly alligators.

1

0

56

Bayesian ML at Scale

@BayesianIn

5 months ago

@DongNguyeb @yudapearl https://t.co/AtOX4CD8xv

Bayesian ML at Scale

@BayesianIn

5 months ago

Thanks for the problem. Let me try to understand it. X=1 is the patient has the inclination to take the drug and X=2 is the patient does not have the inclination to take the drug, these two inclinations are equally probable, so P(X=1)=P(X=2)=0.5 P(death|do(Drug),X=1,M2)=0.2 P(death|do(Drug),X=2,M2)=0 P(death|do(Placebo),X=1,M2)=0.1 P(death|do(Placebo),X=2,M2)=0.1 M1 is that the Drug behaves like the Placebo (sugar tablet) P(death|do(Drug),X=1,M1)=0.1 P(death|do(Drug),X=2,M1)=0.1 P(death|do(Placebo),X=1,M1)=0.1 P(death|do(Placebo),X=2,M1)=0.1 So by the backdoor rule: P(death|do(Drug),M1)=P(death|do(Drug),M2)=0.1, so with X unobserved the likelihood is identical in both cases. So the likelihood for model=M1 and for model=M2 are the same in an RCT which does not observe X. If there is an additional single observation of: death, drug, X=1 (patient has the inclination to take the drug, and this was observed) The probability of this single observation is 0.2 under M2 and 0.1 under M1. This makes the posterior probability of M2 as 2/3. I must confess I am confused about a) Why M2 involves a covariate which is the inclination to take the drug, rather than another easily measured attribute. Does this make the example more interesting? b) Why you think this poses a challenge to Bayes. It's entirely possible I misunderstood an aspect of this example.

1

3

0

1

3K

1

0

55

Bayesian ML at Scale

@BayesianIn

5 months ago

Thanks for the problem. Let me try to understand it. X=1 is the patient has the inclination to take the drug and X=2 is the patient does not have the inclination to take the drug, these two inclinations are equally probable, so P(X=1)=P(X=2)=0.5 P(death|do(Drug),X=1,M2)=0.2 P(death|do(Drug),X=2,M2)=0 P(death|do(Placebo),X=1,M2)=0.1 P(death|do(Placebo),X=2,M2)=0.1 M1 is that the Drug behaves like the Placebo (sugar tablet) P(death|do(Drug),X=1,M1)=0.1 P(death|do(Drug),X=2,M1)=0.1 P(death|do(Placebo),X=1,M1)=0.1 P(death|do(Placebo),X=2,M1)=0.1 So by the backdoor rule: P(death|do(Drug),M1)=P(death|do(Drug),M2)=0.1, so with X unobserved the likelihood is identical in both cases. So the likelihood for model=M1 and for model=M2 are the same in an RCT which does not observe X. If there is an additional single observation of: death, drug, X=1 (patient has the inclination to take the drug, and this was observed) The probability of this single observation is 0.2 under M2 and 0.1 under M1. This makes the posterior probability of M2 as 2/3. I must confess I am confused about a) Why M2 involves a covariate which is the inclination to take the drug, rather than another easily measured attribute. Does this make the example more interesting? b) Why you think this poses a challenge to Bayes. It's entirely possible I misunderstood an aspect of this example.

1

3

0

1

3K

Bayesian ML at Scale

@BayesianIn

5 months ago

While this is more or less the main RecSys heuristic, and it is hard to beat, I do think we should try to do better. Being satisfied with a completely ad-hoc solution is not a long term path to progress.

Pedro Domingos

@pmddomingos

5 months ago

Simple way to replace RL with supervised learning: assign the reward to every action on the path to it and learn to predict it. Hypothesis: no RL algorithm will ever beat this by much.

37

194

5

162

35K

0

4

1

406

Bayesian ML at Scale

@BayesianIn

6 months ago

@DongNguyeb Many people find that when they specify their probability (point of indifference to buying and selling bets) over repeated measures e.g. x1..xn that their probabilities are exchangeable (and hence have a de Finetti representation).

0

1

0

31

Bayesian ML at Scale

@BayesianIn

6 months ago

The @yudapearl Pearlian view is that causal inference is a completely separate discipline to statistical inference (and statistical estimation can be tackled using either the Bayesian or frequentist paradigm) and then causal inference is "inference across distributions", that is a modification of a (frequentist) probability that accounts for an intervention. Here I quote @analisereal

BayesianIn's tweet photo. The @yudapearl Pearlian view is that causal inference is a completely separate discipline to statistical inference (and statistical estimation can be tackled using either the Bayesian or frequentist paradigm) and then causal inference is "inference across distributions", that is a modification of a (frequentist) probability that accounts for an intervention. Here I quote @analisereal

1

5

2

0

391

Bayesian ML at Scale

@BayesianIn

6 months ago

@DongNguyeb You may assume a probability specification is exchangeable in any sense.

1

0

63

Bayesian ML at Scale

@BayesianIn

6 months ago

If you deem a future outcome on a unit that receives a treatment exchangeable with past outcomes that received a treatment and a future outcome on a unit that receives no treatments exchangeable with past outcomes on units that received no treatments then this is a powerful and consequential assumption that enables causal inference.

1

0

87

Bayesian ML at Scale

@BayesianIn

6 months ago

@DongNguyeb It is a price or a point of indifference between buying and selling bets.

0

49

Bayesian ML at Scale

@BayesianIn

6 months ago

@DongNguyeb Yes, I am not using counterfactuals. If you have y1,...yn|t1,..tn then there is exchangeability of yi,yj if ti=tj.

2

0

94

Bayesian ML at Scale

@BayesianIn

6 months ago

@yudapearl I feel like I am repeating myself, and I suspect you feel the same. At a later date, I will use a different forum to try to outline your point of view (as best I understand it) and the small points in which I have a differing view.

1

0

60

Bayesian ML at Scale

@BayesianIn

6 months ago

> "I don't understand the urge people have to demonstrate "we don't need this machinery", especially when the alternative machinery they propose is so cognitively cumbersome." This is a separate question, more around preference, taste and foundations. The advantage of basing causal inference purely on the Ramsey-de Finetti-Savage theory of statistics are: - Automatically consistent with the most complete axiom system for decision making under uncertainty that we know. - An operational procedure for determining conditional exchangeability relationships usually needed for causal inference. - Likely incoherence and inadmissibility arguments can be made against a two step procedure of estimate a joint probability then apply causal inference. Yes, this is academic and perhaps of little practical consequence, but still of some concern from a foundational point of view. Some disadvantages include: - Belief that frequentist probability and causal concepts are more intuitive, than Bayesian probability and conditional exchangeability. - Ability to ignore covariates, greatly simplifying certain analyses. Feel free to add more.

1

0

40

BayesianIn retweeted

Bayesian ML at Scale

@BayesianIn

6 months ago

Thanks for the engagement and the thoughtful response. It is difficult to outline my (many) points of agreement and the few points where I differ in an X post, but I will try. > "I've never insisted on the "frequency interpretation" of probability." In this paper statistical analysis is defined in frequentist terms. https://t.co/GDR7FGbNmK I acknowledge I am being particularly purist here, but to a strict (operational subjective) Bayesian, probability does not exist, and the idea of a Bayesian estimator is a contradiction in terms. The concept of "experimental conditions remaining the same" only makes sense with a frequentist notion of repeated draws from a low-dimensional probability model.

BayesianIn's tweet photo. Thanks for the engagement and the thoughtful response. It is difficult to outline my (many) points of agreement and the few points where I differ in an X post, but I will try.

> "I've never insisted on the "frequency interpretation" of probability."

In this paper statistical analysis is defined in frequentist terms.
https://t.co/GDR7FGbNmK

I acknowledge I am being particularly purist here, but to a strict (operational subjective) Bayesian, probability does not exist, and the idea of a Bayesian estimator is a contradiction in terms.

The concept of "experimental conditions remaining the same" only makes sense with a frequentist notion of repeated draws from a low-dimensional probability model.

1

0

1

0

130

Bayesian ML at Scale

@BayesianIn

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users