Botter

@bottergpt

Nature and Landscape Photographer. Getty Images & VCG Contributor. Deep Learning Engineer. Life is short and the world is wide.

People's Republic of China

Joined August 2015

66 Following

78 Followers

219 Posts

bottergpt retweeted

Yun-Ta Tsai

@yunta_tsai

2 days ago

Many people think any given ML project is 99% training. In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training. The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data. Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.

532

11K

17M

bottergpt retweeted

Sebastian Raschka

@rasbt

over 1 year ago

Every Spring, I'm excited to read through the comprehensive ~100-page "State of Machine Learning Competitions" report, which offers many interesting insights into current trends, useful tools, and emerging methodologies in the field. Below are some key takeaways from the latest 2024 report (https://t.co/ERfoItrYFh): 1) Language & frameworks - Python remains the dominant language, with 76 out of 79 winning solutions. - PyTorch continues to be the deep learning framework of choice, with 53 out of 60 deep learning competition winners. 2) Hardware trends - Over 80% of winning teams used NVIDIA GPUs (with A100s being the most popular) - Interestingly, there's still no mention of AMD GPUs. - I'm surprised no solution utilized more than an 8xH100 server, which suggests that multi-node setups are either underutilized or underreported. 3) Efficiency Techniques - Techniques like LoRA are still popular choices for reducing training compute requirements, but many now opt for full finetuning for improved modeling performance. - And 8-bit and 4-bit quantization remain the most popular approaches for lowering inference compute requirements. 4) LLM reasoning - The integration of chain-of-thought reasoning and inference-time scaling already made its way into competitions. But these approaches currently rely on simplistic majority voting rather than advanced verifier LLMs (I expect more sophisticated implementations soon) 5) Computer vision - Interestingly, most winning solutions in computer vision competitions are CNN- rather than transformer-based. Bonus: In one of the chapters of my LLM book, I described training a decoder-style LLM (GPT) for classification, which is a concept that surprised many readers. Interestingly, the report mentioned that many NLP competitions used decoder-style LLMs for classification tasks as well: > [...] several competitions seemed designed specifically with these powerful new decoder LLMs in mind. [...] The most commonly-used decoder models among competition winners in 2024 were variants of Llama, Mistral, Gemma, Qwen, and DeepSeek models. Several competition winners used only decoder models." However, I recently saw the release of ModernBERT by Jeremy Howard's team, and I recommend at least trying this new encoder-style model before jumping to (often larger) decoder-style LLMs.

rasbt's tweet photo. Every Spring, I'm excited to read through the comprehensive ~100-page "State of Machine Learning Competitions" report, which offers many interesting insights into current trends, useful tools, and emerging methodologies in the field. Below are some key takeaways from the latest 2024 report (https://t.co/ERfoItrYFh):

1) Language & frameworks
- Python remains the dominant language, with 76 out of 79 winning solutions.
- PyTorch continues to be the deep learning framework of choice, with 53 out of 60 deep learning competition winners.

2) Hardware trends
- Over 80% of winning teams used NVIDIA GPUs (with A100s being the most popular)
- Interestingly, there's still no mention of AMD GPUs.
- I'm surprised no solution utilized more than an 8xH100 server, which suggests that multi-node setups are either underutilized or underreported.

3) Efficiency Techniques
- Techniques like LoRA are still popular choices for reducing training compute requirements, but many now opt for full finetuning for improved modeling performance.
- And 8-bit and 4-bit quantization remain the most popular approaches for lowering inference compute requirements.

4) LLM reasoning
- The integration of chain-of-thought reasoning and inference-time scaling already made its way into competitions. But these approaches currently rely on simplistic majority voting rather than advanced verifier LLMs (I expect more sophisticated implementations soon)

5) Computer vision
- Interestingly, most winning solutions in computer vision competitions are CNN- rather than transformer-based.

Bonus: In one of the chapters of my LLM book, I described training a decoder-style LLM (GPT) for classification, which is a concept that surprised many readers. Interestingly, the report mentioned that many NLP competitions used decoder-style LLMs for classification tasks as well:

> [...] several competitions seemed designed specifically with these powerful new decoder LLMs in mind. [...] The most commonly-used decoder models among competition winners in 2024 were variants of Llama, Mistral, Gemma, Qwen, and DeepSeek models. Several competition winners used only decoder models."

However, I recently saw the release of ModernBERT by Jeremy Howard's team, and I recommend at least trying this new encoder-style model before jumping to (often larger) decoder-style LLMs.

382

270

24K

Botter @bottergpt

about 2 years ago

Just got back from an amazing trip to South Korea! The beaches in Busan were stunning; we spent the evenings by the sea, enjoying the breeze, listening to music, and watching fireworks in the distance. The sound of the waves was truly beautiful.

bottergpt's tweet photo. Just got back from an amazing trip to South Korea! The beaches in Busan were stunning; we spent the evenings by the sea, enjoying the breeze, listening to music, and watching fireworks in the distance. The sound of the waves was truly beautiful. https://t.co/IG7gBI2xBX

153

bottergpt retweeted

Andrew Ng

@AndrewYNg

about 2 years ago

We just released a new climate emulator to explore the application of Stratospheric Aerosol Injection (SAI) to mitigate global warming! SAI uses reflective particles in the atmosphere to reflect sunlight and thereby cool Earth’s surface. Our emulator lets you explore how different ways to apply SAI might affect average global temperature. Please check out the emulator at https://t.co/OxtaQMyDuL. SAI is a promising direction, but we still need more research to better understand its impact and potential implementation. Big thanks to collaborators @jeremy_irvin16 @DanVisioni Ben Kravitz @dakotagruener @chrisroadmap and @DWatsonParris

969

156

260

141K

bottergpt retweeted

Andrej Karpathy

@karpathy

over 2 years ago

Nice read on the rarely-discussed-in-the-open difficulties of training LLMs. Mature companies have dedicated teams maintaining the clusters. At scale, clusters leave the realm of engineering and become a lot more biological, hence e.g. teams dedicated to "hardware health". It can be a frustrating daily life experience of training large models to "babysit" the training run. You're there carefully monitoring the vital signs of your run: loss spikes, numerical issues, throughput, gradient norms, policy entropy, etc. Every time the run degrades or flatlines (can happen often), you quickly look for the stack trace to see what's up. You have to do this fast or 10,000 GPUs could be idling. Often, it is a new, exotic, scary-looking error you've never seen before so you summon help to see if anyone can see what's up. The worst ones like to occur at 4am. Often no one can, so you just ban some nodes that look a bit sketchy and try to restart the run. Sometimes the run goes down just because you have not earned the favors of your gods that day, so you put a while True: loop around your launch command. The underlying issues can be highly diverse, from some GPUs just getting a bit too hot and suddenly doing incorrect multiplication once in a while, to some router going down and decreasing the networked file system I/O, to someone in the datacenter physically disconnecting a wire as part of an un-communicated maintenance. Sometimes you'll never know. Another necessary related citation here is the famous OPT-175B logbook and I'd hope more like it can see the light of day in the future. (see chronicles/OPT175B_Logbook.pdf in the git repo) https://t.co/6xOHVtj0Gf TLDR LLM training runs are significant stress-tests of an overall fault tolerance of a large computing system acting as a biological entity. And when you're shopping around for your compute, think about a lot more than just FLOPs and $. Think about the whole service from hardware to software across storage, networking, and compute. And think about whether the team maintaining it looks like The Avengers and whether you could become best friends.

480

656K

Botter @bottergpt

over 2 years ago

@hydantess1993 还是要睡够吧，不然对心脏比较伤。建议加保健品和挥拍运动

198

Botter @bottergpt

over 2 years ago

@pengchujin 上一个apple id用礼品卡充值，后面被封了🥹

Botter @bottergpt

almost 3 years ago

@yetone @mrfishyu 香港卡就可以？

117

Botter @bottergpt

almost 3 years ago

@taresky @sohuko104 @OfflineHelper @passluo 可以看看getty images，全球最大的商业图库，视觉中国图库业务的前身是华盖创意，也就是getty中国

409

Botter @bottergpt

almost 3 years ago

@jesselaunz 是因为肖像权吗？

10K

Botter @bottergpt

almost 3 years ago

Regardless of the circumstances, take action without overthinking. Understand the essence of this statement, and prioritize action over words. Act, act, act!

Botter @bottergpt

almost 3 years ago

My pessimism about this country continues to grow, and I'm beginning to wonder if I should explore better options.

Botter @bottergpt

almost 3 years ago

In this way, I can get what I need in most cases. Now, when I work, I always have a screen with ChatGPT, and I use Genie and Copilot in VSCode for assistance. It's quite enjoyable, and my workflow has completely changed.

Botter @bottergpt

almost 3 years ago

I just realized that GPT can actually save me a lot of time.

109

Botter @bottergpt

almost 3 years ago

But GPT may have already read most of the things I search for and can provide me with a summary directly. I can then have it validate whether there are any missing details or correct any errors.

Botter @bottergpt

almost 3 years ago

@turingou NVDA SOXL ?

999

Botter @bottergpt

almost 3 years ago

@lzwjava Congrats!🥳🥳

Botter @bottergpt

about 3 years ago

@hydantess1993 cx现同事，fish前同事，你们都是大佬 🫣

114

Botter

@bottergpt

Last Seen Users on Sotwe

Trends for you

Most Popular Users