Blake Danson

over 1 year ago

When I look outside and hear people grumbling, anxious, or pessimistic — while breathing in clean air underneath the warmth of a dependable sun — I can’t help but wonder how many moments are being wasted between tragedies.

blakedanson retweeted

over 1 year ago

Folks seem to rediscover this every couple of years. As I’ve been saying for many years, you have to track token accuracy, not loss/perplexity, otherwise you’ll wrongly think the validation loss going up is a bad thing.

148K

Former Banker | Investor | Real Estate | Burger Joint Owner | FYI NOT FINANCIAL ADVISE

over 1 year ago

I feel like we are going to see more APIs that accept embeddings as inputs and return response streams from various token prediction models, of all kinds. Models communicate with ensemble models via embeddings, makes sense to ensemble over API and save on processing steps.

blakedanson retweeted

elvis

@omarsar0

over 1 year ago

A nice example of where language AI agents can outperform frontier LLMs and even human experts. This paper introduces Aviary, an extensible open-source gymnasium for language agents. They show that language agents backed by open-source, non-frontier LLMs can match and exceed both frontier LLM agents and human experts on multiple scientific tasks at up to 100x lower inference cost.

omarsar0's tweet photo. A nice example of where language AI agents can outperform frontier LLMs and even human experts.

This paper introduces Aviary, an extensible open-source gymnasium for language agents.

They show that language agents backed by open-source, non-frontier LLMs can match and exceed both frontier LLM agents and human experts on multiple scientific tasks at up to 100x lower inference cost.

508

522

52K

Who to follow

JAQ

@jaq_neco

focusing on Maslow’s bottom tier for 2026

over 1 year ago

If AI agents are the next big wave of innovation, software developers should consider focusing on writing backend services to be exclusively consumed by AI agents instead of humans. I suspect AI agent tools will turn into API calls to different vendors, opening up a new market for startups.

blakedanson retweeted

Haider.

@haider1

over 1 year ago

🚨 NVIDIA Introduces Jetson Nano Super > compact AI computer capable of 70-T operations per second > designed for robotics, it supports advanced models, including LLMs, and costs $249

663

23K

over 1 year ago

Something you learn about the world as you train machine learning models is that failure isn’t binary. It’s a gradient. We shouldn’t avoid failure, that would guarantee our model doesn’t learn. Instead, fail better with each attempt. This analogy works best if your objective remains fixed (aka, trying multiple attempts to accomplish the one specific task).

blakedanson retweeted

almost 2 years ago

Something that nearly everyone is sleeping on is the importance of prompt caching. We've just added support for it to Claudette, so @AnthropicAI caching is now *very* easy to use -- cached tokens are 90% cheaper, and faster! Docs here: https://t.co/93AyBsDdEr

jeremyphoward's tweet photo. Something that nearly everyone is sleeping on is the importance of prompt caching.

We've just added support for it to Claudette, so @AnthropicAI caching is now *very* easy to use -- cached tokens are 90% cheaper, and faster!

Docs here: https://t.co/93AyBsDdEr https://t.co/aN6UaJKdbl

710

433

51K

blakedanson retweeted

Unitree

@UnitreeRobotics

almost 2 years ago

[Open Source] Unitree First View Teleoperation for Humanoid Robots In order to advance the convenience of data collection for humanoid robots, we refer to other solutions to do the adaptation development and open source. Github：https://t.co/KvSc8pV0O9 #Unitree #Humanoid #AGI #AI #EmbodiedIntelligence #Manipulation #Teleoperation #DataCollection

665

133

218

117K

blakedanson retweeted

Santiago

@svpino

almost 2 years ago

The single most undervalued fact of linear algebra: Matrices are graphs, and graphs are matrices. Encoding matrices as graphs is a cheat code, making complex behavior simple to study. Let me show you how!

svpino's tweet photo. The single most undervalued fact of linear algebra:

Matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! https://t.co/gGDkhYIfKt

142

13K

14K

blakedanson retweeted

Together AI @togethercompute

almost 2 years ago

Woohoo gpu.cpp has now been officially released! This is a really exciting project which provides a device-independent way to use GPU compute. No more writing separate code for CUDA, AMD, Mac, and Intel GPUs! We can't wait to see what you all are able to create with this.

219

618

121K

blakedanson retweeted

Andrej Karpathy

@karpathy

almost 2 years ago

In 2019, OpenAI announced GPT-2 with this post: https://t.co/jjP8IXmu8D Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail: https://t.co/XjLWE2P0Hp Incredibly, the costs have come down dramatically over the last 5 years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g. the FineWeb-Edu dataset). For this exercise, the algorithm was kept fixed and follows the GPT-2/3 papers. Because llm.c is a direct implementation of GPT training in C/CUDA, the requirements are minimal - there is no need for conda environments, Python interpreters, pip installs, etc. You spin up a cloud GPU node (e.g. on Lambda), optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you're stepping in minutes. You then wait 24 hours and enjoy samples about English-speaking Unicorns in the Andes. For me, this is a very nice checkpoint to get to because the entire llm.c project started with me thinking about reproducing GPT-2 for an educational video, getting stuck with some PyTorch things, then rage quitting to just write the whole thing from scratch in C/CUDA. That set me on a longer journey than I anticipated, but it was quite fun, I learned more CUDA, I made friends along the way, and llm.c is really nice now. It's ~5,000 lines of code, it compiles and steps very fast so there is very little waiting around, it has constant memory footprint, it trains in mixed precision, distributed across multi-node with NNCL, it is bitwise deterministic, and hovers around ~50% MFU. So it's quite cute. llm.c couldn't have gotten here without a great group of devs who assembled from the internet, and helped get things to this point, especially ademeure, ngc92, @gordic_aleksa, and rosslwheeler. And thank you to @LambdaAPI for the GPU cycles support. There's still a lot of work left to do. I'm still not 100% happy with the current runs - the evals should be better, the training should be more stable especially at larger model sizes for longer runs. There's a lot of interesting new directions too: fp8 (imminent!), inference, finetuning, multimodal (VQVAE etc.), more modern architectures (Llama/Gemma). The goal of llm.c remains to have a simple, minimal, clean training stack for a full-featured LLM agent, in direct C/CUDA, and companion educational materials to bring many people up to speed in this awesome field. Eye candy: my much longer 400B token GPT-2 run (up from 33B tokens), which went great until 330B (reaching 61% HellaSwag, way above GPT-2 and GPT-3 of this size) and then exploded shortly after this plot, which I am looking into now :)

karpathy's tweet photo. In 2019, OpenAI announced GPT-2 with this post:
https://t.co/jjP8IXmu8D

Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail:
https://t.co/XjLWE2P0Hp

Incredibly, the costs have come down dramatically over the last 5 years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g. the FineWeb-Edu dataset). For this exercise, the algorithm was kept fixed and follows the GPT-2/3 papers.

Because llm.c is a direct implementation of GPT training in C/CUDA, the requirements are minimal - there is no need for conda environments, Python interpreters, pip installs, etc. You spin up a cloud GPU node (e.g. on Lambda), optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you're stepping in minutes. You then wait 24 hours and enjoy samples about English-speaking Unicorns in the Andes.

For me, this is a very nice checkpoint to get to because the entire llm.c project started with me thinking about reproducing GPT-2 for an educational video, getting stuck with some PyTorch things, then rage quitting to just write the whole thing from scratch in C/CUDA. That set me on a longer journey than I anticipated, but it was quite fun, I learned more CUDA, I made friends along the way, and llm.c is really nice now. It's ~5,000 lines of code, it compiles and steps very fast so there is very little waiting around, it has constant memory footprint, it trains in mixed precision, distributed across multi-node with NNCL, it is bitwise deterministic, and hovers around ~50% MFU. So it's quite cute.

llm.c couldn't have gotten here without a great group of devs who assembled from the internet, and helped get things to this point, especially ademeure, ngc92, @gordic_aleksa, and rosslwheeler. And thank you to @LambdaAPI for the GPU cycles support.

There's still a lot of work left to do. I'm still not 100% happy with the current runs - the evals should be better, the training should be more stable especially at larger model sizes for longer runs. There's a lot of interesting new directions too: fp8 (imminent!), inference, finetuning, multimodal (VQVAE etc.), more modern architectures (Llama/Gemma). The goal of llm.c remains to have a simple, minimal, clean training stack for a full-featured LLM agent, in direct C/CUDA, and companion educational materials to bring many people up to speed in this awesome field.

Eye candy: my much longer 400B token GPT-2 run (up from 33B tokens), which went great until 330B (reaching 61% HellaSwag, way above GPT-2 and GPT-3 of this size) and then exploded shortly after this plot, which I am looking into now :)

122

742

725K

blakedanson retweeted

clem 🤗

@ClementDelangue

about 2 years ago

GPU-Poor no more: super excited to officially release ZeroGPU in beta today. Congrats @victormustar & team for the release! In the past few months, the open-source AI community has been thriving. Not only Meta but also Apple, NVIDIA, Bytedance, Snowflake, Databricks, Microsoft, Google, and more have released open models and datasets on Hugging Face, which now hosts over 1M models on the Hub which have been downloaded over a billion times. More than that, many are starting to be better than proprietary APIs. This movement has been supported not only by big tech but also by a thriving open-source AI community that includes academic labs, startups, and independent hobbyists. For example, more than 35,000 variation models of Llama have been shared on Hugging Face since Meta’s first version a year ago—including more than 7,000 based on Llama-3—ranging from quantized and merged models to specialized models in biology and Mandarin, to name a few. More than 4 million AI builders are now using Hugging Face. However, the open-source community doesn’t have the same resources available to train and demo these models that big tech have at their disposal, which is why ChatGPT remains the most used AI application today. @huggingface is fighting this by launching ZeroGPU, a shared infrastructure for indie and academic AI builders to run AI demos on Spaces, giving them the freedom to pursue their work without the financial burden of compute costs. Spaces have been the most popular way to build AI demos, with over 300,000 AI demos created so far on CPU or paid GPU (and a thousand more every day). To foster the continued development of the AI ecosystem, Hugging Face is committing $10M of free GPUs with the launch today of ZeroGPU. Technically speaking, ZeroGPU leverages Hugging Face's experience in hosting and serving more than 100 Petabytes monthly from the Hugging Face Hub. ZeroGPU allows Spaces to run on multiple GPUs by making Spaces efficiently hold and release GPUs as needed (as opposed to a classical GPU Space that holds exactly one GPU at any time). This architecture is also more energy-efficient since GPUs are shared rather than duplicated. ZeroGPU uses @nvidia A100 GPU devices under the hood. You can learn more about ZeroGPU here: https://t.co/1mxUxXmElv More than 1,300 ZeroGPU spaces have been built since we started giving early access to AI builders on May 1, 2024: https://t.co/XvJ2MkcK7R You can explore some examples from @victormustar: https://t.co/b8SUcRelJf You can find the article from @kyliebytes: https://t.co/87uN1vnMu8 🤗🤗🤗

ClementDelangue's tweet photo. GPU-Poor no more: super excited to officially release ZeroGPU in beta today. Congrats @victormustar & team for the release!

In the past few months, the open-source AI community has been thriving. Not only Meta but also Apple, NVIDIA, Bytedance, Snowflake, Databricks, Microsoft, Google, and more have released open models and datasets on Hugging Face, which now hosts over 1M models on the Hub which have been downloaded over a billion times. More than that, many are starting to be better than proprietary APIs.

This movement has been supported not only by big tech but also by a thriving open-source AI community that includes academic labs, startups, and independent hobbyists. For example, more than 35,000 variation models of Llama have been shared on Hugging Face since Meta’s first version a year ago—including more than 7,000 based on Llama-3—ranging from quantized and merged models to specialized models in biology and Mandarin, to name a few. More than 4 million AI builders are now using Hugging Face.

However, the open-source community doesn’t have the same resources available to train and demo these models that big tech have at their disposal, which is why ChatGPT remains the most used AI application today.

@huggingface is fighting this by launching ZeroGPU, a shared infrastructure for indie and academic AI builders to run AI demos on Spaces, giving them the freedom to pursue their work without the financial burden of compute costs. Spaces have been the most popular way to build AI demos, with over 300,000 AI demos created so far on CPU or paid GPU (and a thousand more every day). To foster the continued development of the AI ecosystem, Hugging Face is committing $10M of free GPUs with the launch today of ZeroGPU.

Technically speaking, ZeroGPU leverages Hugging Face's experience in hosting and serving more than 100 Petabytes monthly from the Hugging Face Hub. ZeroGPU allows Spaces to run on multiple GPUs by making Spaces efficiently hold and release GPUs as needed (as opposed to a classical GPU Space that holds exactly one GPU at any time). This architecture is also more energy-efficient since GPUs are shared rather than duplicated. ZeroGPU uses @nvidia A100 GPU devices under the hood.

You can learn more about ZeroGPU here: https://t.co/1mxUxXmElv

More than 1,300 ZeroGPU spaces have been built since we started giving early access to AI builders on May 1, 2024: https://t.co/XvJ2MkcK7R

You can explore some examples from @victormustar: https://t.co/b8SUcRelJf

You can find the article from @kyliebytes: https://t.co/87uN1vnMu8

🤗🤗🤗

220

368

276K

blakedanson retweeted

about 2 years ago

We are thrilled to be a launch partner for Meta Llama 3. Experience Llama 3 now at up to 350 tokens per second for Llama 3 8B and up to 150 tokens per second for Llama 3 70B, running in full FP16 precision on the Together API! 🤯 https://t.co/uSCFhbnp8E

382

88K

blakedanson retweeted

over 2 years ago

How to write CUDA on AMD.

694

257

78K

blakedanson retweeted

Brett Adcock

@adcock_brett

over 2 years ago

OpenAI + Figure conversations with humans, on end-to-end neural networks: → OpenAI is providing visual reasoning & language understanding → Figure's neural networks are delivering fast, low level, dexterous robot actions (thread below)

396

blakedanson retweeted

over 2 years ago

I used to find writing CUDA code rather terrifying. But then I discovered a couple of tricks that actually make it quite accessible. In this video I introduce CUDA in a way that will be accessible to Python folks, & I even show how to do it all in Colab! https://t.co/WGXXctbalv

350

244K

blakedanson retweeted

Google AI

@GoogleAI

over 2 years ago

Most of an LLM’s memory & compute are consumed by matrix multiplication operations. Today on the blog, learn about techniques used to accelerate mixed-input matrix multiplication for increased efficiency w/ performance close to peak hardware capabilities ↓https://t.co/l4rP606zKY

253

714

304K

blakedanson retweeted

Google AI

@GoogleAI

over 2 years ago

Introducing ASPIRE, a framework that enhances the selective prediction capabilities of large language models, enabling them to output an answer paired with a confidence score. Learn how it outperforms state-of-the-art methods on a variety of QA datasets. → https://t.co/NQeIV1fXJ6

800

219

278

117K