@xwanyex I think Zitron's main points are around finance, which I don't know well enough to comment on, but I find his perspective interesting. This is a parallel point to are LLMs are impressive or useful.
A friend just informed me that our colleage, Professor Arthur Dempster, has died last month at 96. Arthur was an intellectual giant, famous for developing the EM algorithm as well as for the Shafer-Dempster theory, but remained skeptic about causation.
https://t.co/lqQyEKjTPM. May his memory be an inspiration.
Thanks for the response, I will wait and see if Prof Pearl has the same interpretation as you.
From an operational point of view, if M1 and M2 only differ in something that isn't observed, then I would say they don't really differ at all, not just in the likelihood but for any practical purpose.
How does: "M2 - a drug that cures 10% and kills 10%."
Make sense unless it's possible to differentiate the 10% who are cured and the 10% who are killed? Otherwise you would just say it does nothing (like M1).
Thanks for the problem.
Let me try to understand it.
X=1 is the patient has the inclination to take the drug and X=2 is the patient does not have the inclination to take the drug, these two inclinations are equally probable, so
P(X=1)=P(X=2)=0.5
P(death|do(Drug),X=1,M2)=0.2
P(death|do(Drug),X=2,M2)=0
P(death|do(Placebo),X=1,M2)=0.1
P(death|do(Placebo),X=2,M2)=0.1
M1 is that the Drug behaves like the Placebo (sugar tablet)
P(death|do(Drug),X=1,M1)=0.1
P(death|do(Drug),X=2,M1)=0.1
P(death|do(Placebo),X=1,M1)=0.1
P(death|do(Placebo),X=2,M1)=0.1
So by the backdoor rule:
P(death|do(Drug),M1)=P(death|do(Drug),M2)=0.1, so with X unobserved the likelihood is identical in both cases. So the likelihood for model=M1 and for model=M2 are the same in an RCT which does not observe X.
If there is an additional single observation of:
death, drug, X=1 (patient has the inclination to take the drug, and this was observed)
The probability of this single observation is 0.2 under M2 and 0.1 under M1. This makes the posterior probability of M2 as 2/3.
I must confess I am confused about
a) Why M2 involves a covariate which is the inclination to take the drug, rather than another easily measured attribute. Does this make the example more interesting?
b) Why you think this poses a challenge to Bayes.
It's entirely possible I misunderstood an aspect of this example.
Thanks for the problem.
Let me try to understand it.
X=1 is the patient has the inclination to take the drug and X=2 is the patient does not have the inclination to take the drug, these two inclinations are equally probable, so
P(X=1)=P(X=2)=0.5
P(death|do(Drug),X=1,M2)=0.2
P(death|do(Drug),X=2,M2)=0
P(death|do(Placebo),X=1,M2)=0.1
P(death|do(Placebo),X=2,M2)=0.1
M1 is that the Drug behaves like the Placebo (sugar tablet)
P(death|do(Drug),X=1,M1)=0.1
P(death|do(Drug),X=2,M1)=0.1
P(death|do(Placebo),X=1,M1)=0.1
P(death|do(Placebo),X=2,M1)=0.1
So by the backdoor rule:
P(death|do(Drug),M1)=P(death|do(Drug),M2)=0.1, so with X unobserved the likelihood is identical in both cases. So the likelihood for model=M1 and for model=M2 are the same in an RCT which does not observe X.
If there is an additional single observation of:
death, drug, X=1 (patient has the inclination to take the drug, and this was observed)
The probability of this single observation is 0.2 under M2 and 0.1 under M1. This makes the posterior probability of M2 as 2/3.
I must confess I am confused about
a) Why M2 involves a covariate which is the inclination to take the drug, rather than another easily measured attribute. Does this make the example more interesting?
b) Why you think this poses a challenge to Bayes.
It's entirely possible I misunderstood an aspect of this example.
While this is more or less the main RecSys heuristic, and it is hard to beat, I do think we should try to do better. Being satisfied with a completely ad-hoc solution is not a long term path to progress.
Simple way to replace RL with supervised learning: assign the reward to every action on the path to it and learn to predict it.
Hypothesis: no RL algorithm will ever beat this by much.
@DongNguyeb Many people find that when they specify their probability (point of indifference to buying and selling bets) over repeated measures e.g. x1..xn that their probabilities are exchangeable (and hence have a de Finetti representation).
The @yudapearl Pearlian view is that causal inference is a completely separate discipline to statistical inference (and statistical estimation can be tackled using either the Bayesian or frequentist paradigm) and then causal inference is "inference across distributions", that is a modification of a (frequentist) probability that accounts for an intervention. Here I quote @analisereal
If you deem a future outcome on a unit that receives a treatment exchangeable with past outcomes that received a treatment and a future outcome on a unit that receives no treatments exchangeable with past outcomes on units that received no treatments then this is a powerful and consequential assumption that enables causal inference.
@yudapearl I feel like I am repeating myself, and I suspect you feel the same.
At a later date, I will use a different forum to try to outline your point of view (as best I understand it) and the small points in which I have a differing view.
> "I don't understand the urge people have to demonstrate "we don't need this machinery", especially when the alternative machinery they propose is so cognitively cumbersome."
This is a separate question, more around preference, taste and foundations.
The advantage of basing causal inference purely on the Ramsey-de Finetti-Savage theory of statistics are:
- Automatically consistent with the most complete axiom system for decision making under uncertainty that we know.
- An operational procedure for determining conditional exchangeability relationships usually needed for causal inference.
- Likely incoherence and inadmissibility arguments can be made against a two step procedure of estimate a joint probability then apply causal inference. Yes, this is academic and perhaps of little practical consequence, but still of some concern from a foundational point of view.
Some disadvantages include:
- Belief that frequentist probability and causal concepts are more intuitive, than Bayesian probability and conditional exchangeability.
- Ability to ignore covariates, greatly simplifying certain analyses.
Feel free to add more.
Thanks for the engagement and the thoughtful response. It is difficult to outline my (many) points of agreement and the few points where I differ in an X post, but I will try.
> "I've never insisted on the "frequency interpretation" of probability."
In this paper statistical analysis is defined in frequentist terms.
https://t.co/GDR7FGbNmK
I acknowledge I am being particularly purist here, but to a strict (operational subjective) Bayesian, probability does not exist, and the idea of a Bayesian estimator is a contradiction in terms.
The concept of "experimental conditions remaining the same" only makes sense with a frequentist notion of repeated draws from a low-dimensional probability model.