Freeze

@icematt

John 3:16 Ł I ₿ Ξ R ₮ Y ltcmweb1qqw9rdd3fdcjsedh9ac8tk8tfdaukqdv2mt4vt2paude6g72my0y42q5fsrn2e833mxwvppfav8pnc6x0uqemfz5qc584dzjevewf5xge5vrh9pzj

Liberland

Joined December 2011

1.3K Following

489 Followers

6.5K Posts

Pinned Tweet

Freeze

@icematt

over 1 year ago

443E6dKtRDULnxQYhguGAqMQVnLAcGkGVDxpNGiWAUbTLZU35AHmDGSReVKekNngN6EAw5nNrM7S2eggUjP5kyVtAUN5MzU

icematt retweeted

Concerned Citizen

@BGatesIsaPyscho

about 9 hours ago

Wow - Someone turned the current state of the UK into a GTA Style Video Game ‼️

262

17K

918K

icematt retweeted

0xSero

@0xSero

2 days ago

Home lab tip. Update your router. Get something really good that can handle the sheer pressure running 10~ computers over tailscale etc.

0xSero's tweet photo. Home lab tip. Update your router.

Get something really good that can handle the sheer pressure running 10~ computers over tailscale etc. https://t.co/GOhLNPTaLE

258

15K

Freeze

@icematt

2 days ago

@NVIDIAGeForce #RTXPowersPlay

Who to follow

monerobull

@monerobull

Monero means money. @cakewallet: 43XmrbuLLWD3JGHg8CRCdPBCrYB1CLZMBXfiDoeyoqF24cruNGFXqDpFExckKxDo9ggmgKGazwsdPcJtee1AqhRyCRn7JqZ

Monero Primero

@monero1ro

Monero fue la primera criptomoneda y la más relevante hoy día! 8AzkakUFFRB92Zd8vhF47nciWSb77e3RUioGKp1LvvZwMtU7G3G1k8qDyHDuFu9sdBjP1dtGd2Yf1GSm4osmsmPiMDDs8bA

icematt retweeted

3 days ago

holy fucking shit ahahhahaha

362

496

516K

icematt retweeted

Massimo

@Rainmaker1973

5 days ago

The chances of a deer taking you out at a crosswalk are slim. But never zero.

104

348

416

293K

icematt retweeted

stu 🥪🥞

@stutxo

6 days ago

new ecash just dropped

icematt retweeted

Brian Armstrong

@brian_armstrong

5 days ago

Aging is arguably the root cause of most major diseases (loss of function in our cells). Four years ago, we made a bet that aging was treatable, and NewLimit was born. NewLimit now has a prototype drug that reverses the age of some human cells (restores function they had when they were younger), and a clinical trial scheduled for next year (with more drug candidates in the pipeline). Grateful to Founders Fund, Thrive, Greenoaks, and the rest of the investors for this latest round. @jacobkimmel and the team are just getting started.

442

10K

964

icematt retweeted

Documenting Saylor

@saylordocs

5 days ago

Warren Buffet: 1950 : $100,000 2026: $170 billion Crypto Guys : 19:50pm : $100,000 20:26pm : $170

166

133

155K

icematt retweeted

stevibe

@stevibe

5 days ago

Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together. (The test: place each element at the right pixel position on a blank form image, not type into a field.) Setup: > Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool). > I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height. > The blue boxes on the screen are its detections. Look how tight they are — it nails every field. Result: > Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct. > Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas. > Character-box alignment still a touch loose, but every value is where it belongs. > 9m10s, 224.5k input, 24.3k output, 21 turns. Why it matters: > Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can. > A combination of small models can do the work of a single large one.

275

146K

icematt retweeted

Alex Cheema

@alexocheema

6 days ago

A year ago at GTC, Jensen brought out a DGX Spark in one hand and a MacBook in the other. Yesterday, at GTC Taipei, Jensen brought out NVIDIA's new RTX Spark laptop in both hands. This is the start of a new era of personal computing - the personal AI era. In the new era, there are two competing platforms: - @apple with macOS / MLX - @nvidia with Windows / CUDA Everyone will have an always-on personal agent that runs locally, constantly looking out for you, working for you proactively, monitoring the internet and talking to other agents. This will be a personal AI agent you own, that's private, that's aligned with you (not OpenAI or Anthropic). @karpathy calls it personal computing v2. Let's set the scene for the new era of personal computing by diving into the one thing that will matter the most - the hardware. The best hardware for local AI isn't what's running in a data center. It's a radically different problem. Here's a breakdown of the 3 most important things: 1. Memory. LLMs are big. To run a model locally, you need to fit the entire model into memory. Apple (with Apple Silicon) and NVIDIA (with DGX Spark + RTX Spark) have both moved towards unified memory, which puts all the memory on one chip - leveraging cheaper LPDDR5X memory - useful for making more memory accessible to the GPU. The alternative competing architecture is a disaggregated CPU/GPU architecture - which is what the DGX Station uses. It has a large pool of slow LPDDR5X CPU memory (496GB @ 396GB/s), and a small pool of high-speed HBM3e GPU memory (252GB @ 7.1TB/s). It has a high bandwidth link (900GB/s) between the CPU memory and GPU memory, enabling fast disaggregated inference e.g. Attention on GPU, FFN on CPU. This enables running really large models like Kimi K2.6 (1T parameters) by offloading experts from CPU memory to GPU memory as they are needed. You could imagine something like this in a smaller form factor. Hardware today: - Apple M5 Max MacBook Pro: 128GB unified memory. - NVIDIA DGX Spark / RTX Spark: 128GB unified memory. 2. Memory bandwidth. In a data center, multiple user's requests can be batched together, which amortizes the cost of moving model weights into memory across many requests, pushing up arithmetic intensity to compute bound territory - meaning FLOPS matters a lot. Locally, everything runs at low batch size, which is low arithmetic intensity, i.e. memory bound - so FLOPS don't matter. What matters memory bandwidth. High memory bandwidth -> fast TPS. Low memory bandwidth -> slow TPS. Hardware today: - Apple M5 Max MacBook Pro: 617GB/s memory bandwidth. - NVIDIA DGX Spark: 273GB/s memory bandwidth. - NVIDIA RTX Spark: TBC. 3. Power. In a data center, we talk about MegaWatts. Locally, we talk about Watts. Laptops have limited battery life. The best laptop batteries have a capacity of ~100Wh. LLM inference on a MacBook Pro consumes ~140W, meaning battery life with a persistent personal agent is less than an hour. This is unusable. The game will become how long can you run a useful agent on a laptop battery. Apple and NVIDIA will compete on how long an agent can run on battery - this will become the new battery life metric. This could be where an NPU or NPU/GPU hybrid really shines. Apple ANE has about 10x better power efficiency than the GPU on Apple Silicon (but has ~4-5x less memory bandwidth, with about the same FLOPS as the GPU). There will be an entire design space of how to build energy efficient agents - this will involve co-optimizing the harness, models, inference engines together. Hardware today: - Apple M5 Max MacBook Pro: Consumes 140W, battery capacity ~100Wh - NVIDIA DGX Spark: Rated for 240W, consumes 140W. No battery (direct PSU). - NVIDIA RTX Spark: TBC. The hardware battle will be fierce, and I expect a move towards co-design, i.e. hardware designed *with* personal agent workloads. On top of this, models are improving, we're getting more intelligence per bit/watt, and open-source harnesses like @NousResearch Hermes / OpenClaw are improving rapidly. Within the next 2 years, we'll inevitably have unmetered, private Opus-4.8 / GPT-5.5 level intelligence running locally on a future version of a MacBook or RTX Spark. I like this future a lot better than the one where OpenAI / Anthropic control the intelligence layer of the internet and can rent-seek on intelligence. Beyond this, NVIDIA is ahead on general AI ecosystem, i.e. the CUDA moat. Apple is ahead on local AI ecosystem, i.e. models quantized/rightsized for MacBooks, native macOS apps, and ease of setup. We'll see how this might change as the new RTX Spark also brings full native CUDA to Windows-on-Arm laptops for the first time, potentially closing the gap. There are many other factors I haven't mentioned here, but I believe I've covered the timeless, most important things for the new era of personal computing.

alexocheema's tweet photo. A year ago at GTC, Jensen brought out a DGX Spark in one hand and a MacBook in the other.

Yesterday, at GTC Taipei, Jensen brought out NVIDIA's new RTX Spark laptop in both hands.

This is the start of a new era of personal computing - the personal AI era.

In the new era, there are two competing platforms:
- @apple with macOS / MLX
- @nvidia with Windows / CUDA

Everyone will have an always-on personal agent that runs locally, constantly looking out for you, working for you proactively, monitoring the internet and talking to other agents. This will be a personal AI agent you own, that's private, that's aligned with you (not OpenAI or Anthropic). @karpathy calls it personal computing v2.

Let's set the scene for the new era of personal computing by diving into the one thing that will matter the most - the hardware.

The best hardware for local AI isn't what's running in a data center. It's a radically different problem. Here's a breakdown of the 3 most important things:

1. Memory.
LLMs are big. To run a model locally, you need to fit the entire model into memory. Apple (with Apple Silicon) and NVIDIA (with DGX Spark + RTX Spark) have both moved towards unified memory, which puts all the memory on one chip - leveraging cheaper LPDDR5X memory - useful for making more memory accessible to the GPU. The alternative competing architecture is a disaggregated CPU/GPU architecture - which is what the DGX Station uses. It has a large pool of slow LPDDR5X CPU memory (496GB @ 396GB/s), and a small pool of high-speed HBM3e GPU memory (252GB @ 7.1TB/s). It has a high bandwidth link (900GB/s) between the CPU memory and GPU memory, enabling fast disaggregated inference e.g. Attention on GPU, FFN on CPU. This enables running really large models like Kimi K2.6 (1T parameters) by offloading experts from CPU memory to GPU memory as they are needed. You could imagine something like this in a smaller form factor.
Hardware today:
- Apple M5 Max MacBook Pro: 128GB unified memory.
- NVIDIA DGX Spark / RTX Spark: 128GB unified memory.

2. Memory bandwidth.
In a data center, multiple user's requests can be batched together, which amortizes the cost of moving model weights into memory across many requests, pushing up arithmetic intensity to compute bound territory - meaning FLOPS matters a lot. Locally, everything runs at low batch size, which is low arithmetic intensity, i.e. memory bound - so FLOPS don't matter. What matters memory bandwidth. High memory bandwidth -> fast TPS. Low memory bandwidth -> slow TPS.
Hardware today:
- Apple M5 Max MacBook Pro: 617GB/s memory bandwidth.
- NVIDIA DGX Spark: 273GB/s memory bandwidth.
- NVIDIA RTX Spark: TBC.

3. Power.
In a data center, we talk about MegaWatts. Locally, we talk about Watts. Laptops have limited battery life. The best laptop batteries have a capacity of ~100Wh. LLM inference on a MacBook Pro consumes ~140W, meaning battery life with a persistent personal agent is less than an hour. This is unusable. The game will become how long can you run a useful agent on a laptop battery. Apple and NVIDIA will compete on how long an agent can run on battery - this will become the new battery life metric. This could be where an NPU or NPU/GPU hybrid really shines. Apple ANE has about 10x better power efficiency than the GPU on Apple Silicon (but has ~4-5x less memory bandwidth, with about the same FLOPS as the GPU). There will be an entire design space of how to build energy efficient agents - this will involve co-optimizing the harness, models, inference engines together.
Hardware today:
- Apple M5 Max MacBook Pro: Consumes 140W, battery capacity ~100Wh
- NVIDIA DGX Spark: Rated for 240W, consumes 140W. No battery (direct PSU).
- NVIDIA RTX Spark: TBC.

The hardware battle will be fierce, and I expect a move towards co-design, i.e. hardware designed *with* personal agent workloads. On top of this, models are improving, we're getting more intelligence per bit/watt, and open-source harnesses like @NousResearch Hermes / OpenClaw are improving rapidly. Within the next 2 years, we'll inevitably have unmetered, private Opus-4.8 / GPT-5.5 level intelligence running locally on a future version of a MacBook or RTX Spark. I like this future a lot better than the one where OpenAI / Anthropic control the intelligence layer of the internet and can rent-seek on intelligence.

Beyond this, NVIDIA is ahead on general AI ecosystem, i.e. the CUDA moat. Apple is ahead on local AI ecosystem, i.e. models quantized/rightsized for MacBooks, native macOS apps, and ease of setup. We'll see how this might change as the new RTX Spark also brings full native CUDA to Windows-on-Arm laptops for the first time, potentially closing the gap.

There are many other factors I haven't mentioned here, but I believe I've covered the timeless, most important things for the new era of personal computing.

506

460

107K

icematt retweeted

Rapid Response 47

@RapidResponse47

8 days ago

70K

11K

Freeze

@icematt

8 days ago

@sudoingX I prefer the following test question for LLMs: "What does a 6" std 40' steel pipe weigh?" Answer is ~759lbs. Step3.7 answers incorrectly.

181

icematt retweeted

NVIDIA AI

@NVIDIAAI

11 days ago

You should read this thread. It used to take about 25 seconds to generate a 5-second video on 8 Blackwell GPUs. The legends at @haoailab brought that down to just 4.2 seconds on a single Blackwell GPU… and then open sourced the tech behind it.

184

981

170K

icematt retweeted

Dami-Defi

@DamiDefi

13 days ago

Claude Code cannot read 300 files at once. So someone built a system that lets it control NotebookLM from the terminal instead. The results are wild. Here is the full workflow nobody is talking about: The Setup → Claude Code connects to NotebookLM via a command line interface → Claude searches YouTube, finds relevant videos, uploads them as sources automatically → NotebookLM processes up to 300 sources simultaneously and returns cited, grounded answers → Everything syncs back into your Obsidian vault with passage-level citations you can click to verify Why This Changes Research Forever → No more 20 browser tabs you never close → No more copy-pasting outputs into random notes → No more hallucinated answers with no sources to back them up → 60% of citations verified as strong matches in accuracy audits - answers are grounded in real data What Claude Can Do From the Terminal → Search YouTube for relevant videos on any topic and rank by relevance → Create a new NotebookLM notebook and add 20 sources in parallel automatically → Ask questions and export cited answers directly into Obsidian with wikilinks → Set custom personas per notebook - concise, no filler, no preamble → Generate audio overviews and save them as MP3 files into your vault → Build mind maps, flashcard decks, and research dashboards from your sources → Search arXiv for academic papers and feed them directly into NotebookLM → Upload competitor blog posts, podcast episodes, PDFs, and your own vault notes The Obsidian Output → Every answer arrives with clickable citations that link to the exact passage in the source video or article → Graph view shows connections between all 20 sources and the topics they share → Q&A log tracks every question asked and the grounded response received → Source dashboard shows citation frequency, topics extracted, and which questions each source answered Use Cases Worth Building Today → Academic research with arXiv papers, full citation traceability → Competitor analysis from their YouTube channels and blog posts → Company knowledge base for onboarding, new employees ask NotebookLM instead of interrupting teammates → Podcast research, feed 4-hour Lex Fridman episodes and ask what's new in AI this week → Personal second brain, 300 daily notes uploaded and queryable in one notebook Before this system existed you needed 20 tabs, hours of manual reading, and no guarantee the answers were real. Now you type one prompt in the terminal and Claude does all of it for you. The research stack of 2026 is not a browser. It is a terminal connected to everything

179

249K

icematt retweeted

NVIDIA AI

@NVIDIAAI

13 days ago

Good explainer on world models - well done @juliarturc

666

537

96K

icematt retweeted

ALX 🇺🇸

@alx

13 days ago

Meanwhile, RFK Jr. subplot:

353

74K

icematt retweeted

MR. OBVIOUS

@ObviousRises

15 days ago

When normies ask me to explain my ideology.

267

981

52K

icematt retweeted

0xSero

@0xSero

17 days ago

MTP + Speculative Decoding are nearly free VRAM wise and can help speed up local inference to some crazy numbers. If slow inference turned you from lmstudio, take time to try with these optimisations enabled. I'm getting 164 tok/s on Gemma-4-31B & 100+ tok/s on DS4-Flash

470

375

40K

icematt retweeted