R² is a widely used measure of fit, but for many analysts, it is just a number.
They believe high R² ➡ Good predictions.
This is not always true!
Now I will clarify. 🔽
R-squared measures how well the regression model fits the observed data.
To be more precise: It is the proportion of the variation in the dependent variable that is predictable from the independent variable.
It usually ranges from 0 to 1: (In rare cases it can be negative, I will explain this in another tweet)
R² = 0
The model does not explain any of the variability in the dependent variable ➡ No predictive power ➡ Bad model.
R² = 1
The model perfectly explains all the variability in the dependent variable ➡ Perfect fit to the data ➡ Good model if not overfitted and has predictive power.
A high R-squared value does not mean that the predictions made by the model will be correct.
It doesn't measure predictability power, it measures how well the model fits!
In the example below, we compare the mean of the data to a fitted line.
Of course, the mean of values is not a good fit ➡ the errors are large.
On the other hand, the fitted line has smaller errors ➡ The R² will be close to 1.
To calculate R² we need:
- The total sum of squares for the mean
- Sum of squares for the residuals from the model
- Finally, subtract the ratio from 1
___
That's it for today.
I hope you've found this Tweet helpful.
Like/Retweet for support and follow @levikul09 for more Data Science content.
Thanks 😉
@pawjast@freeCodeCamp If you already use markdown then obsidian will be easy to move to. It's way more customizable and clutter free. Of course there are use cases where notion is better, put you will probably find a plug in that will achieve a similar thing.
@pawjast I found this today. Looks cool and the community is growing, I see some folks use it for coding. I might try it out. That's next level speed they do with it. https://t.co/K2BHIV5K6k
@pawjast Yeah, same for me actually , laptop, windows, vs code but with external screen and keyboard. I tried vim motions but the learning curve is steep. It would be great for writing as well in obsidian. I have some macros in Ahk for productivity though.
Conditional probability is the single most important concept in statistics.
Why? Because without accounting for prior information, predictive models are useless.
Here is what conditional probability is, and why it is essential.
How to build a good understanding of math for machine learning?
I get this question a lot, so I decided to make a complete roadmap for you. In essence, three fields make this up: calculus, linear algebra, and probability theory.
Let's take a quick look at them!
Four years ago, I started a project that turned out to be life-altering, and it has just passed a huge milestone.
I'm happy to announce that my Mathematics of Machine Learning book will soon be published by Packt Publishing! Yes, there will be a physical version.