Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
Announcing ARC-AGI-3
The only unsaturated agentic intelligence benchmark in the world
Humans score 100%, AI <1%
This human-AI gap demonstrates we do not yet have AGI
Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
@pedma7 Ah good point, you have to wait a bit longer to see enough samples. Yeah I can usually detect if something is off pretty quickly, fortunately saves the drawdown
Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.
Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.
@kareem_carr Agreed, also I think it’s important to look at who’s teaching. I found that I learned mathematics significantly better when it was taught by a professor who actually did research in math, and not some adjacent field like engineering
The first question you'll need to answer when looking for a job in Machine Learning:
How do you deal with an imbalanced dataset?
Let's discuss 7 different ways to deal with this problem.
Imagine you have pictures of cats and dogs. Your dataset has 950 cat pictures and only 50 dog pictures.
That's an imbalanced dataset. There's a significant difference in the number of samples for each class.
Imagine a model that classifies every picture of the dataset as a cat. Such a model will be 95% accurate and never identify a dog! You can create a dumb function that always returns "CAT" and will be correct 95% of the time!
That's a big problem. Accuracy is never a good metric to measure how good a model is when working on an imbalanced task. Instead, look at any of the following:
• Precision
• Recall
• F-Score
• Confusion Matrix
• ROC Curves
• A combination of these
The second strategy is to collect more data: If you can find more dog pictures, do that.
Sometimes this is impossible, but the simplest solution is often the most effective.
If you can't collect more data, consider augmenting the dataset with synthetic samples. This is not always possible, but if you can create realistic samples, take advantage of it.
Another way to work around an imbalanced dataset is to resample your data. You can do any of the following:
• Oversample the pictures of dogs.
• Undersample the pictures of cats.
• Do a little bit of both.
For example, you can use every dog picture four times and half of the cat pictures. Your final dataset will have 400 dogs (50 × 4) and 475 cats (950 ÷ 2).
Over and undersampling introduce biases into your dataset. You are changing the data distribution. Be careful with this.
Another approach is to leave the dataset alone and focus on the algorithm you use to process it.
First, you can weigh each class differently to have a model pay more or less attention to those samples. For example, you can use a larger weight for dogs to compensate for the lack of samples.
The algorithm you use plays a vital role. For example, Decision Trees are excellent at taking imbalanced classes. Neural networks, not so much.
Finally, ensure you frame the problem correctly.
Finally, I've seen people framing an anomaly detection problem as a multi-class classification. That's the wrong approach.
You have to understand what problem you are trying to solve before deciding how to do it.
I partnered with Synthetic Mind to bring you this post. It's a free AI newsletter with 70,000+ subscribers. Subscribe, and you’ll get a free guide on turning ChatGPT into your personal assistant:
https://t.co/bxLa2vs850
Let's recap the seven different techniques you can use to handle imbalanced datasets:
1. Pick the appropriate performance metric
2. Collect more data
3. Generate synthetic data
4. Resample the dataset
5. Use different weights
6. Try different algorithms
7. Frame the problem correctly
Is there anything else you can do to work with imbalanced datasets?