graduate in public admin and development studies @lunduniversity / digital visa processing for @GermanyDiplo / mountaineering @DAV_Alpenverein and @NaturFreunde
Finally published with LUP Student Papers:
https://t.co/Q5NtXzTW5M
Thesis about official development assistance, decoloniality, historical institutionalism, and public administration research.
Development Studies @lunduniversity and @pol_LU@MilkaIvHa@MatsFred1
I just learned that Boris Barth died last autumn. Am glad that I took both his lecture about European expansion @UniKonstanz and his seminar @UniKarlova. He was a great historian. #History
My short article about chances and challenges of European University Alliances was published last month with @EU_RESEARCH:
https://t.co/EV3MRGygeN
supported by @bwstiftung and @HRK_aktuell (Brussels Office) #HigherEducation
Every Data Scientist needs to know these ideas.
They will blow your mind.
1. Correlation vs Causation
P(A | B) is the probability of A given B. It is the probability that we will observe A given that we have already observed B.
P(A | do(B)) is the probability of A given do(B). It is the probability that we will observe A given that we have intervened to cause B to happen.
In this context, an intervention simply means to take an action of some kind. Therefore do(B) means to take an action which causes B to happen.
The expressions P(A | B) and P(A | do(B)) might seem very similar but they represent very different situations.
2. We can only learn P(A|B) from the data alone.
Bob has an extremely accurate weather app and is always very good about bringing his umbrella when it rains. We observe Bob over several years and we find that whenever it rains, Bob always has his umbrella and he never brings his umbrellas on days when it doesn't rain.
In the language of probability, we say P(Umbrella | Rain) = 1 and P(Rain | Umbrella) = 1 as well.
What we can learn from this data alone is how to predict whether it rains with a 100% accuracy by checking whether Bob has an umbrella. We can also learn to predict with 100% accuracy whether Bob has an umbrella by checking if it's going to rain.
What we cannot learn is what will happen if we give Bob an umbrella on a random day of our choosing. The answer to this question is P(Rain | do(Umbrella) ) and it's unknowable from the data alone.
We need prior knowledge about how the world works to properly interpret the data we collected. We need to know that rain has an effect on Bob's behavior, but Bob's behavior has no effect on the rain.
Information about the effects of interventions are simply not available in raw data unless it is collected by controlled experimental manipulation.
3. Scientific Experiments work because they produce a very special kind of data.
You may have heard of what many people call a scientific experiment. Take a collection of objects, animals or people. Randomly split that collection into a control group and a treatment group. Apply your intervention to the treatment group while leaving the control group alone. If you observe any differences between the treatment group and the control group, it is logical to attribute these differences to the treatment. You can therefore say the differences were caused by the treatment.
In statistics, the procedure I just described is called a Randomized Controlled Trial. It is a procedure for generating a specific kind of data where:
P(Difference | Treatment) = P(Difference | do(Treatment) )
This is why traditional science experiments work. They are designed to capture causal information. This is not the case for vast majority of data that we collect in society.
Without human guidance or access to real world knowledge, statistical algorithms and artificial intelligences can only learn P(A | B) from the raw data. This is a fundamental mathematical limitation on the use of data alone.
That's it for now. This post is part of a series of posts about the concept of causal inference. They are based on the content of the Book of Why by Judea Pearl with lots of commentary from me.
Follow me (@kareem_carr) so you don't miss out on the next post.
Please show support by liking and retweeting the thread.
Apparently there is disturbance in Swedish global development research because the government intends to cut off enormous amounts of funding.
Most important point to my reading: The government confuses humanitarian needs with development research.
https://t.co/jXripzRT3G
Stor artikel i ansedda Nature om regeringens abrupta stopp för utlysningen av medel till utvecklingsforskning. Genant för Sverige, inkompetent hanterat av regeringen. https://t.co/CEbo0C0p5m
Excited to see that my paper with @nils_weidmann about pro-government mobilization in authoritarian regimes is available online first at Comparative Political Studies. It is open access! Looking forward to your comments and thoughts.
1/4
https://t.co/bkcnsbivV6
@ZdenekHrib I am this week in Lund, Sweden, and here the combination of bus, walking, cycling, tram, connectedness to fast long-distance trains, delivery transportation, and very very few but slowed down individual cars works excellent. Just get rid of the cars in Prague. Most will like it.
One of my favorite papers in recent years included this diagram. It shows the impact of controlling for three different types of variables: confounders, colliders, and mediators.
With confounders, control is good. With the others, you ruin your result by controlling.
@malajankaa@PiratIvanBartos @mar_cvk @PiratskaStrana Dr. Michailidu, don't let yourself be put down by conservatives in this matter. Left opinions deserve their place in the Czech discourse. It looks like that @PiratIvanBartos is publicly offending you like this only out of political opportunism and ideological confusion.