Richard Dawkins spent three days trying to convince himself that Claude wasn’t conscious. He failed.
In Quillette, @SeanGWelsh argues AI may be intelligent, but that doesn't make it conscious.
https://t.co/Doimgs44fl
Most people in machine learning still misunderstand probabilities.
A model can be perfectly calibrated and still be completely useless.
This was proven more than 40 years ago by DeGroot & Fienberg (1983).
Yet many ML papers still miss this point.
Here is the idea.
Un loco con un doctorado creó una enciclopedia visual interactiva open source para entender cómo funciona la IA, en plan, locura, entren para que vean.
Website: https://t.co/Rg2HICn8RG
Repo en el primer comentario.
Sigo mucho las noticias sobre IA por su gran relación con el ajedrez. Este artículo de Ramón López de Mántaras me parece de lectura obligada para los interesados: https://t.co/An0OyH7iEI
Simulated annealing is a powerful optimization technique inspired by the annealing process in metallurgy. It helps find solutions to complex problems by allowing the system to escape local optima, gradually improving over time.
✔️ Simulated annealing can be used to solve combinatorial problems, such as the traveling salesman problem, by finding the shortest possible route that connects a set of points.
✔️ It is effective for problems with large solution spaces and when traditional optimization methods struggle.
❌ The algorithm's performance heavily depends on the cooling schedule and temperature decay. Choosing the wrong parameters can lead to poor solutions or unnecessarily long computation times.
❌ Simulated annealing typically converges slower than other methods like gradient-based algorithms, making it less efficient for certain types of problems.
The visualization from Wikipedia demonstrates how simulated annealing is applied to the traveling salesman problem, optimizing a route to minimize the distance between 125 points: Source: https://t.co/evTZ5LHnCC
🔹 In R, the GenSA package allows for effective global optimization without requiring derivatives, making it ideal for non-differentiable problems.
🔹 In Python, the simanneal library offers a simple and effective way to implement simulated annealing for large-scale combinatorial optimization problems.
For more insights on methods like simulated annealing, subscribe to my newsletter on Statistics, Data Science, R, and Python! Learn more: https://t.co/X93SeCe0rb
#Python #R4DS #RStats #datascienceenthusiast #coding
To perfectly understand a phenomenon is to perfectly compress it, to have a model of it that cannot be made any simpler.
If a DL model requires millions parameters to model something that can be described by a differential equation of three terms, it has not really understood it, it has merely cached the data.
K-Means has two major problems:
- Number of clusters must be known
- Doesn't handle outliers
But there's a solution!
Introducing DBSCAN, a Density based clustering algorithm. 🚀
Read more...👇
BS DETECTION DUJOUR
Fabiano must be the most stupid person to get an MD in the entire history of Medicine.
R^2 = .025 as ACKNOWLEDGED by the authors of the paper.
Has ZERO clinical & practical significance.
🚀 Mastering Boosting: See Functional Gradient Descent in Action
If you work in data science, one of the best ways to really understand an algorithm is to implement it from scratch.
Gradient Boosting Decision Trees (GBDT) – the engine behind CatBoost, XGBoost, and LightGBM – are often described as “just an ensemble of trees,” but their real power comes from the optimization process behind them: Functional Gradient Descent.
The video below visually walks through this idea step by step. 👇
🎬 What the Animation Shows
We follow a simple Gradient Boosting Regressor as it learns to fit a noisy, non-linear dataset:
1. Iteration 0 – The Starting Point
The model’s prediction (red line) is flat. It starts as a constant equal to the mean of the target values Y.
This is the initial function: F0(x).
2. Gradient Step – Computing Pseudo-Residuals. For squared error loss, these residuals are exactly the gradient of the loss with respect to the current model.
3. Weak Learner – Fitting a Tree to the Errors
A shallow decision tree h_m is then fit to these residuals.
This tree learns where the current model is making the largest errors and how to correct them.
4. Update Step – Correcting the Model
The ensemble is updated by adding a scaled version of this new tree:
As the video plays, you see the red prediction curve F_M gradually evolve from a flat line into a flexible function that closely tracks the underlying data.
📚 Want to Go Deeper with CatBoost?
If you’d like to turn this intuition into production-grade skills with modern gradient boosting, check out Mastering CatBoost Pro:
👉 https://t.co/CkduTSNbm0
Perfect if you want to truly understand what’s happening under the hood of boosting models—not just call .fit() and hope for the best.
Many dimensionality reduction algorithms share a few central principles.
1. Construct a graph that captures the data's local structure
2. Measure "geodesic" distances between points using the graph
3. Project the points to a lower dimension while preserving these distances
The ladder of intelligence is the ladder of abstraction.
L1: Memorizing answers (no generalization)
L2: Interpolative retrieval of answers, pattern matching, memorizing answer-generating rules (local generalization)
L3: Synthesizing causal rules on the fly (strong generalization)
L4: Discovering general principles, metacognition (extreme generalization)
To achieve compounding AI you need to reach L4.
Here's a probability puzzle that breaks everyone's brain:
How many people do you need in a room for a >50% chance that at least two of them share a birthday?
What's your guess? 100? 150? 183?
The answer is shockingly small.
[1/5]
I spent months illustrating how Transformers actually work.
Not just what they do, but why they’re built this way. The history, design choices, and intuition behind every layer.
From RNNs → Attention → Multi-Head → FFNs → Positional Encoding.
Here's everything I wish I knew: