Andy Gray

@andynotabot

AI Start-up founder at Kortical. Since the age of 15, I've been chasing the dream of a building an AI to do my chores... but for now it's mainly B2B AI

Joined November 2017

193 Following

43 Followers

102 Posts

Andy Gray

@andynotabot

about 1 month ago

It is well established that models often memorise some subsets of their data but the key distinction is that this isn't ONLY what they do. This is an explainer for a paper that shows they can learn rules that are provably beyond interpolation. So they can learn things that are not possible to get to by memorisation or interpolation between known datapoints. Pretty cool huh? https://t.co/nNfvyqfbLc

Andy Gray

@andynotabot

3 months ago

https://t.co/4NOwf7TZ9m

Andy Gray

@andynotabot

3 months ago

@fchollet Exciting timing, I just published a paper showing transformers can learn held-out rules where interpolation provably scores 0%. Rules out the big argument to stop scores pushing higher on ARC-AGI-2 and now 3. Paper and explainer here: https://t.co/ygaLBR6qXz

Andy Gray

@andynotabot

3 months ago

https://t.co/4NOwf7TZ9m

Andy Gray

@andynotabot

3 months ago

@GaryMarcus your ACM piece made the case that LLMs are fundamentally limited to interpolation, no doubt why you expect this market plateau. I just published results that might surprise you —> transformers hit 97.9% on held-out rules where every interpolation method scores 0%! Backed by a formal proof. Would love your take. Full explainer and paper here: https://t.co/nNfvyqfbLc

Andy Gray

@andynotabot

3 months ago

https://t.co/4NOwf7TZ9m

Andy Gray

@andynotabot

3 months ago

https://t.co/4NOwf7TZ9m

Who to follow

Nic Conner

@nic_conner

Business owner based in North Wales | Sport nerd 🏉 🚲 Tweet about sport: @nicsportstweets 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇬🇧

We aim to provide an informal framework for individuals to help shape our future through imaginative ideas on issues key to our wellbeing as a European country.

andynotabot retweeted

Andrej Karpathy

@karpathy

8 months ago

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

karpathy's tweet photo. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI.

It weighs ~8,000 lines of imo quite clean code to:

- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with "GRPO"
- Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI.
- Write a single markdown report card, summarizing and gamifying the whole thing.

Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.

My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.

Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

682

24K

18K

andynotabot retweeted

TechHalla

@techhalla

10 months ago

I asked nano banana to take me across Middle-earth and this is what happened... all the details on how I made this video, below 🧵👇

271

731

966K

Andy Gray

@andynotabot

about 1 year ago

😂 I thought it was a terrible paper! Disingenuous, badly researched and hyperbolic. I wrote an article with a detailed takedown here https://t.co/t2O9zQlzsM That said I do think over anthropomorphising LLMs is a bad idea and we can see way too much of it already, so I don't disagree with you entirely 😆

111

Andy Gray

@andynotabot

about 1 year ago

I'm seeing lots of people are taking Apple's new paper "The Illusion of Thinking" at face value but there is so much wrong with it, I felt compelled to write an article debunking its claims: https://t.co/t2O9zQlzsM I dive into why bit it looks like they are knowingly trying to create FUD about AI

Andy Gray

@andynotabot

about 1 year ago

@rubenhassid I'm seeing lots of people are taking Apple's new paper "The Illusion of Thinking" at face value but there is so much wrong with it, I felt compelled to write an article debunking its claims: https://t.co/DwQzfeqaLz AGI is on track 😉

andynotabot retweeted

Tesla Optimus

@Tesla_Optimus

about 1 year ago

I’m not just dancing all day, ok

36K

andynotabot retweeted

Min Choi

@minchoi

about 1 year ago

GPT-4o just got an INSANE upgrade! OpenAI just dropped native Image Generation in GPT-4o. Image & Text quality is insane. 100% AI 10 wild examples (prompts included): 1. Polaroid style photographs

minchoi's tweet photo. GPT-4o just got an INSANE upgrade!

OpenAI just dropped native Image Generation in GPT-4o.

Image & Text quality is insane. 100% AI

10 wild examples (prompts included):

1. Polaroid style photographs https://t.co/FRPIsVkMYW

166

466

Andy Gray

@andynotabot

over 1 year ago

Seems like home robots could be happening sooner than we thought! either that or they're looking to raise another round 😉

Brett Adcock

@adcock_brett

over 1 year ago

Important update: Figure is launching robots into the home Our AI, Helix, is advancing faster than any of us anticipated, accelerating our timeline into the home Therefore, we've moved-up our home timeline by 2 years; starting Alpha testing this year

554

646

578

840K

andynotabot retweeted

Min Choi

@minchoi

over 1 year ago

Anthropic just dropped Claude Code 🤯 Now you can delegate coding tasks right from the terminal. Limited research preview for now

328

155

43K

andynotabot retweeted

Tesla AI

@Tesla_AI

over 1 year ago

Teslas now drive themselves from their birthplace at the factory to their designated loading dock lanes without human intervention One step closer to large-scale unsupervised FSD

45K

60M

andynotabot retweeted

Brian Roemmele

@BrianRoemmele

over 1 year ago

Deep Dive On DeepSeek’s New Multimodal AI Released Today And How We Are Getting It Running On A Gaming PC! — DeepSeek’s Janus-Pro represents a significant advancement in multimodal large language models (LLMs), particularly in text-to-image generation. Building upon the foundation of the original Janus model, Janus-Pro introduces enhancements in training processes, data quality, and model architecture, resulting in more stable and detailed image outputs. Technical Architecture: Janus-Pro employs a decoupled architecture, optimizing it for tasks involving both multimodal understanding and text-to-image generation. This design allows for separate processing pathways for different modalities, enhancing the model’s flexibility and performance. The model has been trained on a diverse dataset comprising multimodal, textual, and synthetic aesthetic data through a three-stage process, ensuring superior performance across various tasks. Performance Benchmarks: Janus-Pro has demonstrated exceptional capabilities: •Text-to-Image Generation: •GenEval: Scored 0.80, surpassing OpenAI’s DALL-E 3 (0.67) and Stability AI’s Stable Diffusion 3 Medium (0.74). •DPG-Bench: Achieved an overall accuracy of 84.19, highlighting its proficiency in handling dense and nuanced prompts. •Multimodal Understanding: •MMMU (Multimodal Machine Understanding): Attained an accuracy of 41.0, outperforming models like TokenFlow-XL (38.7). •MME (Multimodal Evaluation): Showed significant gains in reasoning and contextual understanding. These results underscore Janus-Pro’s capabilities in both generating high-quality images from textual prompts and understanding complex multimodal inputs. Running Janus-Pro on Consumer-Grade GPUs These are some of the techniques we deploy when adapting a new larger AI model to run efficiently less expensive computer hardware. This is not an exhaustive list but enough to give you an idea and overview. 1.Model Quantization: Reducing the precision of the model’s weights (e.g., from 16-bit to 8-bit or lower) can significantly decrease memory usage and computational requirements, enabling the model to run on GPUs with limited VRAM. Tools like MiniLLM facilitate running large language models on consumer-grade GPUs. We also imply distillation processes to further improve GPU cycles. 2.Efficient Inference Engines: Utilizing inference engines designed for consumer hardware can enhance performance. For instance, PowerInfer is a high-speed LLM inference engine optimized for personal computers equipped with a single consumer-grade GPU. It exploits the high locality inherent in LLM inference to reduce GPU memory demands and CPU-GPU data transfers. 3.Hardware Considerations: High-end consumer GPUs, such as the NVIDIA RTX 4090, are more suitable for running large models like Janus-Pro due to their substantial VRAM and computational capabilities. However, with appropriate optimization techniques, it’s possible to run the model on GPUs with lower specifications, though performance may be affected. These are some of the strategies we are deploying to run Janus-Pro on consumer-grade gaming computers, By leveraging Janus-Pro, developers and researchers can explore advanced capabilities in both multimodal understanding and image generation, pushing the boundaries of what’s achievable in AI-driven applications. We will keep you updated on the progress.

BrianRoemmele's tweet photo. Deep Dive On DeepSeek’s New Multimodal AI Released Today And How We Are Getting It Running On A Gaming PC!

—

DeepSeek’s Janus-Pro represents a significant advancement in multimodal large language models (LLMs), particularly in text-to-image generation. Building upon the foundation of the original Janus model,

Janus-Pro introduces enhancements in training processes, data quality, and model architecture, resulting in more stable and detailed image outputs.

Technical Architecture:

Janus-Pro employs a decoupled architecture, optimizing it for tasks involving both multimodal understanding and text-to-image generation. This design allows for separate processing pathways for different modalities, enhancing the model’s flexibility and performance.

The model has been trained on a diverse dataset comprising multimodal, textual, and synthetic aesthetic data through a three-stage process, ensuring superior performance across various tasks.

Performance Benchmarks:

Janus-Pro has demonstrated exceptional capabilities:

•Text-to-Image Generation:
•GenEval: Scored 0.80, surpassing OpenAI’s DALL-E 3 (0.67) and Stability AI’s Stable Diffusion 3 Medium (0.74).
•DPG-Bench: Achieved an overall accuracy of 84.19, highlighting its proficiency in handling dense and nuanced prompts.
•Multimodal Understanding:
•MMMU (Multimodal Machine Understanding): Attained an accuracy of 41.0, outperforming models like TokenFlow-XL (38.7).
•MME (Multimodal Evaluation): Showed significant gains in reasoning and contextual understanding.

These results underscore Janus-Pro’s capabilities in both generating high-quality images from textual prompts and understanding complex multimodal inputs.

Running Janus-Pro on Consumer-Grade GPUs

These are some of the techniques we deploy when adapting a new larger AI model to run efficiently less expensive computer hardware. This is not an exhaustive list but enough to give you an idea and overview.

1.Model Quantization: Reducing the precision of the model’s weights (e.g., from 16-bit to 8-bit or lower) can significantly decrease memory usage and computational requirements, enabling the model to run on GPUs with limited VRAM. Tools like MiniLLM facilitate running large language models on consumer-grade GPUs. We also imply distillation processes to further improve GPU cycles.

2.Efficient Inference Engines: Utilizing inference engines designed for consumer hardware can enhance performance. For instance, PowerInfer is a high-speed LLM inference engine optimized for personal computers equipped with a single consumer-grade GPU. It exploits the high locality inherent in LLM inference to reduce GPU memory demands and CPU-GPU data transfers.

3.Hardware Considerations: High-end consumer GPUs, such as the NVIDIA RTX 4090, are more suitable for running large models like Janus-Pro due to their substantial VRAM and computational capabilities. However, with appropriate optimization techniques, it’s possible to run the model on GPUs with lower specifications, though performance may be affected.

These are some of the strategies we are deploying to run Janus-Pro on consumer-grade gaming computers,

By leveraging Janus-Pro, developers and researchers can explore advanced capabilities in both multimodal understanding and image generation, pushing the boundaries of what’s achievable in AI-driven applications.

We will keep you updated on the progress.

54K

Andy Gray

@andynotabot

over 1 year ago

AGI isn’t “near,” @sama—it’s already here. We coined AGI thinking of human-level intelligence. But LLMs are already general, intelligent, just not sentient, superhuman or demanding civil rights. It’s time to redefine: 1️⃣ AGI = Broad general knowledge, common-sense machines (e.g., LLMs). 2️⃣ ASI = Artificial Super-intelligence - SuperIntelligent but not necessarily conscious. 3️⃣ ACI = Artificial Conscious Intelligence - Thinking feeling machines that have a sense of self, needs and wants, etc. Let’s update the terms. #AI

andynotabot retweeted

Yam Peleg

@Yampeleg

over 1 year ago

Heard a leak from one of the frontier labs (not oai tbh), they reached an unexpected HUGE wall of diminishing returns trying to brute-force better results by training longer & using more and more data.. (more severe than what is published publicly)

166

314

886K

andynotabot retweeted

Yann LeCun

@ylecun

over 1 year ago

Not a surprising result. But good that someone tried this out.

147

877

419

386K

Andy Gray

@andynotabot

over 1 year ago

I've spent some time now with @OpenAI o1. The model that was "too dangerous" to release. OpenAI's first specialist reasoning model. The big question I wanted to answer was just how good is o1 at reasoning? Despite all the hype, many people are skeptical if LLMs can reason at all. Is it just mimicry, like a parrot, repeating words it doesn't understand? In this article https://t.co/aVuC8CuilF I give a bit of background, show how I set about to prove it one way or the other and talk through the surprising result. Not to overcook it too much but... I was genuinely not expecting this result. Let me know if you enjoy the read! 😃

andynotabot's tweet photo. I've spent some time now with @OpenAI o1. The model that was "too dangerous" to release. OpenAI's first specialist reasoning model.

The big question I wanted to answer was just how good is o1 at reasoning?

Despite all the hype, many people are skeptical if LLMs can reason at all. Is it just mimicry, like a parrot, repeating words it doesn't understand?

In this article https://t.co/aVuC8CuilF I give a bit of background, show how I set about to prove it one way or the other and talk through the surprising result. Not to overcook it too much but... I was genuinely not expecting this result.

Let me know if you enjoy the read! 😃

andynotabot retweeted

Demis Hassabis

@demishassabis

over 1 year ago

Feedback loop: train SOTA chip design model (AlphaChip) -> use it to design better AI chips -> use them to train better models -> to design better chips... part of the reason why our TPU stack is so good. Congrats @Azaliamirh, @annadgoldie, @JeffDean & the AlphaChip team!

160

228

234K

Andy Gray

@andynotabot

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users