Bryan Yates @bryanyates - Twitter Profile

Pinned Tweet

Bryan Yates

@bryanyates

over 3 years ago

Still one of the best reads on computer memory and the importance of caches.

10

1K

161

775

101K

bryanyates retweeted

Claude

@claudeai

15 days ago

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

5K

105K

15K

22K

57M

bryanyates retweeted

ARC Prize

@arcprize

3 months ago

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

248

4K

576

933

743K

Bryan Yates

@bryanyates

over 1 year ago

tmux + vim is all you need

0

2

625

Who to follow

Stat Arb

@quant_arb

MFT & HFT | Views my own. Not financial advice.

picotrades

@picotrades

In a rack near you, or closer to the matching engine than you. When I’m not taking care of my 🐰, that is.

HangukQuant

@HangukQuant

🇰🇷 crypto MFT/HFT. quant research and quant dev. not financial advice. 🔗 https://t.co/HlaqCGWX4w

Bryan Yates

@bryanyates

over 1 year ago

@sama Aka, straight to prod

0

4

0

185

Bryan Yates

@bryanyates

over 1 year ago

@svpino Moving towards that fast in a lot of fields

0

4

0

1

606

Bryan Yates

@bryanyates

over 1 year ago

@pedma7 Ah good point, you have to wait a bit longer to see enough samples. Yeah I can usually detect if something is off pretty quickly, fortunately saves the drawdown

0

58

bryanyates retweeted

Anthropic

@AnthropicAI

over 1 year ago

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

AnthropicAI's tweet photo. Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.

Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text. https://t.co/ZlywNPVIJP

465

10K

2K

3K

4M

Bryan Yates

@bryanyates

over 1 year ago

Paper: https://t.co/ekffj9SBiZ

0

267

Bryan Yates

@bryanyates

over 1 year ago

Came across this interesting paper from awhile back, still an informative read for transformer applications in time series forecasting.

bryanyates's tweet photo. Came across this interesting paper from awhile back, still an informative read for transformer applications in time series forecasting. https://t.co/EtVomkrv2o

1

3

471

Bryan Yates

@bryanyates

over 1 year ago

Huge achievement 🔥

SpaceX

@SpaceX

over 1 year ago

Mechazilla has caught the Super Heavy booster!

11K

247K

61K

20K

45M

0

1

0

614

Bryan Yates

@bryanyates

almost 3 years ago

@tunguz My sharpe ratio isn’t high enough

0

2

0

280

Bryan Yates

@bryanyates

almost 3 years ago

@Vertox_DF Sounds like you've been in the arena trying things

0

1

0

277

Bryan Yates

@bryanyates

almost 3 years ago

@kareem_carr Agreed, also I think it’s important to look at who’s teaching. I found that I learned mathematics significantly better when it was taught by a professor who actually did research in math, and not some adjacent field like engineering

1

6

0

1

949

Bryan Yates

@bryanyates

almost 3 years ago

@macrocephalopod For market making, add to that "where am I getting run over?"

0

4

0

1

1K

Bryan Yates

@bryanyates

almost 3 years ago

@yi__tang Yeah usually end up trying a number of approaches, up/down sample, loss, different models. Really depends

0

1

0

92

Bryan Yates

@bryanyates

almost 3 years ago

Always a good quant interview question. Find imbalanced datasets across many areas in finance.

Santiago

@svpino

almost 3 years ago

The first question you'll need to answer when looking for a job in Machine Learning: How do you deal with an imbalanced dataset? Let's discuss 7 different ways to deal with this problem. Imagine you have pictures of cats and dogs. Your dataset has 950 cat pictures and only 50 dog pictures. That's an imbalanced dataset. There's a significant difference in the number of samples for each class. Imagine a model that classifies every picture of the dataset as a cat. Such a model will be 95% accurate and never identify a dog! You can create a dumb function that always returns "CAT" and will be correct 95% of the time! That's a big problem. Accuracy is never a good metric to measure how good a model is when working on an imbalanced task. Instead, look at any of the following: • Precision • Recall • F-Score • Confusion Matrix • ROC Curves • A combination of these The second strategy is to collect more data: If you can find more dog pictures, do that. Sometimes this is impossible, but the simplest solution is often the most effective. If you can't collect more data, consider augmenting the dataset with synthetic samples. This is not always possible, but if you can create realistic samples, take advantage of it. Another way to work around an imbalanced dataset is to resample your data. You can do any of the following: • Oversample the pictures of dogs. • Undersample the pictures of cats. • Do a little bit of both. For example, you can use every dog picture four times and half of the cat pictures. Your final dataset will have 400 dogs (50 × 4) and 475 cats (950 ÷ 2). Over and undersampling introduce biases into your dataset. You are changing the data distribution. Be careful with this. Another approach is to leave the dataset alone and focus on the algorithm you use to process it. First, you can weigh each class differently to have a model pay more or less attention to those samples. For example, you can use a larger weight for dogs to compensate for the lack of samples. The algorithm you use plays a vital role. For example, Decision Trees are excellent at taking imbalanced classes. Neural networks, not so much. Finally, ensure you frame the problem correctly. Finally, I've seen people framing an anomaly detection problem as a multi-class classification. That's the wrong approach. You have to understand what problem you are trying to solve before deciding how to do it. I partnered with Synthetic Mind to bring you this post. It's a free AI newsletter with 70,000+ subscribers. Subscribe, and you’ll get a free guide on turning ChatGPT into your personal assistant: https://t.co/bxLa2vs850 Let's recap the seven different techniques you can use to handle imbalanced datasets: 1. Pick the appropriate performance metric 2. Collect more data 3. Generate synthetic data 4. Resample the dataset 5. Use different weights 6. Try different algorithms 7. Frame the problem correctly Is there anything else you can do to work with imbalanced datasets?