mtonsager

@MattTonsager

Joined December 2008

771 Following

82 Followers

177 Posts

MattTonsager retweeted

Scott Hanselman 🌮

@shanselman

4 days ago

VibeOS - Fully Hallucinated Operating System from Microsoft BUILD #msbuild by @stevensanderson (genius) (relax, it's a joke)

359

166

78K

MattTonsager retweeted

Vincent Koc

@vincent_koc

3 days ago

2 months ago i gave a wake up call to engineers. Some thought i was gloating about tokenmaxxing. But the acceleration is real. I gave a glimpse into the factory we are building - since then the machine is full throttle.

237

160

53K

MattTonsager retweeted

Ivan Kuznetsov

@ikuznetsov_com

3 days ago

I've just implemented Patrol: a self-improvement loop that slices the repo and looks for issues to fix. I got the idea from https://t.co/uab3Cxr6RI by @steipete, and with some tweaks I've fully integrated it into the Hive.

132

24K

MattTonsager retweeted

Ahmad

@TheAhmadOsman

6 days ago

Local AI hardware = capacity × bandwidth × software stack - Capacity tells you what fits - Bandwidth tells you how hard the box can breathe - The software stack tells you how much of the spec sheet you can actually cash out. Hardware by Memory Bandwidth - Mac Studio M3 Ultra: up to 512GB @ 819 GB/s - RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s - RTX 5090: 32GB @ 1792 GB/s - RTX 4090: 24GB @ 1008 GB/s - RX 7900 XTX: 24GB @ 960 GB/s - Radeon PRO W7900: 48GB @ 864 GB/s - AMD Radeon AI PRO R9700: 32GB @ 640 GB/s - Intel Arc Pro B65: 32GB @ ~608 GB/s - Tenstorrent Wormhole n300: 24GB @ 576 GB/s - Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G - MacBook Pro M5 Max: 460-614 GB/s - MacBook Pro M5 Pro: 307 GB/s - DGX Spark: 128GB @ 273 GB/s (coherent + CUDA) - Mac mini M4 Pro: 273 GB/s - Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU) - MacBook Air M5: 153 GB/s - Snapdragon X2 Elite: 152-228 GB/s - Intel Lunar Lake: 136 GB/s - Snapdragon X Elite: 135 GB/s - Mac mini M4: 120 GB/s - Arc Pro B60: 24GB @ ~456 GB/s Verdict - GPUs are still the bandwidth kings - Apple wins: stupid amounts of memory, don’t want to shard across GPUs - Apple loses: when raw tokens/sec & concurrency matter more - DGX Spark: coherent memory + NVIDIA stack - Strix Halo / Ryzen AI Max: first real x86 unified-memory contender - Tenstorrent: fully OSS stack, excited to see this mature Fitting ≠ serving Even if it fits, you still pay for - bandwidth during decode - KV cache growth - dequantization - batching + concurrency - scheduler quality - framework overhead The only mental model that matters: 1. What must fit? 2. What bandwidth tier do I need? 3. What software stack can actually deliver it? In short: - NVIDIA → fastest raw speed - Apple Studio M3 Ultra → biggest one-box memory - Strix Halo → first real x86 unified - DGX Spark → coherent NVIDIA dev appliance - AMD / Intel Arc → rising alternatives - Tenstorrent → fully opensource stack Do ask: “which bottleneck am I buying?” Not: “which hardware is best?”

TheAhmadOsman's tweet photo. Local AI hardware = capacity × bandwidth × software stack

- Capacity tells you what fits
- Bandwidth tells you how hard the box can breathe
- The software stack tells you how much of the spec sheet you can actually cash out.

Hardware by Memory Bandwidth
- Mac Studio M3 Ultra: up to 512GB @ 819 GB/s
- RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s
- RTX 5090: 32GB @ 1792 GB/s
- RTX 4090: 24GB @ 1008 GB/s
- RX 7900 XTX: 24GB @ 960 GB/s
- Radeon PRO W7900: 48GB @ 864 GB/s
- AMD Radeon AI PRO R9700: 32GB @ 640 GB/s
- Intel Arc Pro B65: 32GB @ ~608 GB/s
- Tenstorrent Wormhole n300: 24GB @ 576 GB/s
- Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G
- MacBook Pro M5 Max: 460-614 GB/s
- MacBook Pro M5 Pro: 307 GB/s
- DGX Spark: 128GB @ 273 GB/s (coherent + CUDA)
- Mac mini M4 Pro: 273 GB/s
- Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU)
- MacBook Air M5: 153 GB/s
- Snapdragon X2 Elite: 152-228 GB/s
- Intel Lunar Lake: 136 GB/s
- Snapdragon X Elite: 135 GB/s
- Mac mini M4: 120 GB/s
- Arc Pro B60: 24GB @ ~456 GB/s

Verdict

- GPUs are still the bandwidth kings

- Apple wins: stupid amounts of memory, don’t want to shard across GPUs
- Apple loses: when raw tokens/sec & concurrency matter more

- DGX Spark: coherent memory + NVIDIA stack

- Strix Halo / Ryzen AI Max: first real x86 unified-memory contender

- Tenstorrent: fully OSS stack, excited to see this mature

Fitting ≠ serving

Even if it fits, you still pay for
- bandwidth during decode
- KV cache growth
- dequantization
- batching + concurrency
- scheduler quality
- framework overhead

The only mental model that matters:

1. What must fit?
2. What bandwidth tier do I need?
3. What software stack can actually deliver it?

In short:
- NVIDIA → fastest raw speed
- Apple Studio M3 Ultra → biggest one-box memory
- Strix Halo → first real x86 unified
- DGX Spark → coherent NVIDIA dev appliance
- AMD / Intel Arc → rising alternatives
- Tenstorrent → fully opensource stack

Do ask: “which bottleneck am I buying?”

Not: “which hardware is best?”

102

186

127K

Who to follow

MattTonsager retweeted

Cointelegraph

@Cointelegraph

8 days ago

⚡️ TODAY: Billionaire Kevin O'Leary says the next big thing in crypto is whichever blockchain lands at least one S&P 500 company across all 11 sectors. “That's going to be a game changer.”

179

968

148

253

130K

MattTonsager retweeted

Matt Mireles

@mattmireles

almost 5 years ago

“Press On” When I was a child, the old man forced me to recite and memorize this speech He wanted to mentally program us all He carved it into the wall of his man-cave. Literally. (He was an awesome dad)

mattmireles's tweet photo. “Press On”

When I was a child, the old man forced me to recite and memorize this speech

He wanted to mentally program us all

He carved it into the wall of his man-cave. Literally.

(He was an awesome dad) https://t.co/LjQObSUswt

717

186

MattTonsager retweeted

Ahmad

@TheAhmadOsman

15 days ago

Qwen 3.5 27B in NVFP4 w/ full context taking less than 20GB VRAM You can basically run like 5 agents w/ full context on a single RTX PRO 6000 like this, and they'd be so fast Tell me I didn't tell you this was gonna happen

TheAhmadOsman's tweet photo. Qwen 3.5 27B in NVFP4 w/ full context taking less than 20GB VRAM

You can basically run like 5 agents w/ full context on a single RTX PRO 6000 like this, and they'd be so fast

Tell me I didn't tell you this was gonna happen https://t.co/ppg6ZGQMi4

411

224

36K

MattTonsager retweeted

Brian Armstrong

@brian_armstrong

16 days ago

Major areas where the financial system still needs an update: 1. Tokenization of real-world assets - Real estate, stocks, bonds, funds, etc. onchain for instant settlement, fractional ownership & massive distribution. 2. 24/7 Global trading - Pooled global liquidity, every asset, every person, with great leverage and capital efficiency. 3. Next-gen payments - Near-instant, low-cost global transfers using stablecoins, including for Agentic payments. 4. AI-powered risk, credit, compliance, and advice - Better decisions, less fraud, and broader access to capital. Everyone gets access to a great financial advisor. 5. Innovation friendly regulation - Move from one-size-fits-all to risk-based rules that encourage innovation and competition instead of stifling it. 6. Expanded access - Open protocols that reduce middlemen and self-custodial wallets to expand access to everyone with a smartphone. 7. Capital formation - Low cost and turnkey for anyone to raise money for a good idea, increasing the number of startups. 8. Sound money - A refuge from inflation, when discipline is lost in fiat money. Jobs not done until we get these working for all. Will require lots of tech innovation and policy work to get there.

701

736

516K

MattTonsager retweeted

Ahmad

@TheAhmadOsman

16 days ago

Don’t know where to start with Local AI? Read my Local LLMs From Zero to Hero series It covers: - Hardware - Software - Models Mechanics - Everything else necessary Needs no prior experience Easy to understand for any background Local / Opensource AI FTW

TheAhmadOsman's tweet photo. Don’t know where to start with Local AI?

Read my Local LLMs From Zero to Hero series

It covers:
- Hardware
- Software
- Models Mechanics
- Everything else necessary

Needs no prior experience

Easy to understand for any background

Local / Opensource AI FTW https://t.co/5LKv4pAdtQ

426

687

75K

MattTonsager retweeted

Ahmad

@TheAhmadOsman

16 days ago

@perotta Do this + read my last 5 articles and you’ll save yourself months worth of trial and errors https://t.co/3VmAeRGAez

MattTonsager retweeted

Ahmad

@TheAhmadOsman

8 months ago

3090 vs 4090 vs 5090 vs pro 6000 for LLMs > we care about compute FLOPS, memory bandwidth, > VRAM, L2 cache, interconnect, and power/TDP > memory bandwidth ≅ tokens/sec, or how fast you can move tokens/KV-cache (decode ceiling) > 5090 and both PRO 6000s bring 1.8 TB/s GDDR7 lanes (nearly double 4090/3090) > so expect up to 2x decode throughput when bandwidth is your wall > L2 cache ≅ how quickly you hit L2 vs get stuck on DRAM (latency killer) > RTX 3090, Ampere, comes with very small L2 (6 MB) while > Ada/Blackwell’s massive L2 (72 MB 4090 → 96 MB 5090 → 128 MB PRO 6000) > which slashes DRAM trips, keeps KV cache on-die, making it > way faster for small-batch/greedy/agent chain workloads > and big boosts for streaming/latency-sensitive inference > compute FLOPS (FP16/BF16/FP8) ≅ how fast you can run matmuls (prefill/training speed) > 5090 comes with 419 TFLOPS FP16 (838 FP8) > while the PRO 6000’s variations come with > 35% and 10% higher TFLOPS over the 5090 > PRO Workstation: 125 TFLOPS FP32 (≈504 FP16, ≈1.008 PFLOPS FP8) > PRO Max-Q: 110 TFLOPS FP32 (≈439 FP16, ≈878 FP8) > VRAM ≅ how much model + KV + batch fits on-GPU > 96 GB VRAM on both PRO 6000s means KV-cache offload is easy > multi-user 70B (FP4) chat, 32k+ context, and > real throughput for big-batch or fine-tune > interconnect = whether multi-GPU setups actually scale (NVLink vs PCIe), and how fast > and NVLink is dead, except on 3090 > only the 3090 can do true dual-GPU mesh (112 GB/s NVLink) > all others are PCIe-bound, and throttled in comparison to datacenter/HGX > power/TDP ≅ whether you can sustain clocks, or just spike (PSU fried) or/and throttle > 600W (PRO 6000 WS) and 575W (5090) means > bring 1.5–2 kW PSU, 20A circuit, and lots of airflow > Max-Q, at 96 GB, requires 300W but gets lower TFLOPS > (making a dense 4-GPU boxes doable) quick recap of the GPUs specs RTX 3090 (Ampere GA102) > 24 GB GDDR6X > 936 GB/s bandwidth > 6 MB L2 cache > 142 TFLOPS FP16 tensor (285 sparse) > PCIe 4.0 x16 + NVLink (112 GB/s, dual card only, the only consumer card with it) > 350W TDP RTX 4090 (Ada AD102) > 24 GB GDDR6X > 1.008 TB/s bandwidth > 72 MB L2 cache > 330 TFLOPS FP16 tensor > 660 TFLOPS FP8 tensor > PCIe 4.0 x16, no NVLink > 450W TDP RTX 5090 (Blackwell GB202 cut) > 32 GB GDDR7 > 1.792 TB/s bandwidth > 96 MB L2 cache (98 MB physical address space) > 419 TFLOPS FP16 tensor > 838 TFLOPS FP8 tensor > PCIe 5.0 x16, no NVLink > 575W TDP RTX PRO 6000 (Workstation Edition, Blackwell) > 96 GB GDDR7 (ECC) > 1.792 TB/s bandwidth > 128 MB L2 cache (full GB202 die) > 125 TFLOPS FP32 → ≈504 FP16 → ≈1.008 PFLOPS FP8 > PCIe 5.0 x16 > 600W TDP RTX PRO 6000 (Max-Q, Blackwell) > 96 GB GDDR7 (ECC) > 1.792 TB/s bandwidth > 128 MB L2 cache > 110 TFLOPS FP32 → ≈439 FP16 → ≈878 FP8 > PCIe 5.0 x16 > 300W TDP so, what does all of that mean? > 3090 with NVLink is the only consumer route to > high-speed dual-GPU sync without going server/HGX > but that comes with very small L2 cache, lower TFLOPS, > and only 24 GB GDDR6X at 936 GB/s bandwidth per GPU > 5090 (1.8 TB/s bandwidth + 96 MB L2) is for those who want > sub-$2k single-GPU throughput, tokens/sec king, L2 cache monster > PRO 6000 (WS or Max-Q) is for serving everything, 96 GB VRAM unlocks > BIG SIZED models, with full context and no CPU offloading for LLM inference > 5090: king of single-GPU decode speed (1.8 TB/s bandwidth + 96 MB L2) > PRO 6000 (WS/Max-Q): same bandwidth/L2, but 4x the VRAM; pick 600W (Workstation) or 300W (Max-Q) > 3090: budget dual-GPU with NVLink, still wins for parallel inference if you need interconnect > 4090: still a killer in terms of L2 cache, but 24 GB VRAM limits context/batch for larger models, and no NVLink for LLM training > 5090: best raw compute for $ spent + bandwidth + 32 GB VRAM > PRO 6000: massive VRAM (96 GB) removes batch/context bottleneck > Max-Q: “fit four in a box” special 96 GB at 300W per card, dense workstations are back > dual 3090 (NVLink): still out-syncs PCIe-only pairs in some parallel/fine-tune jobs remember > decode = bandwidth (tokens/sec) + L2 (latency) > prefill/training = compute FLOPS + VRAM > scale up = interconnect (NVLink on 3090, else PCIe Gen5) more on the Buy a GPU website later this week :)

TheAhmadOsman's tweet photo. 3090 vs 4090 vs 5090 vs pro 6000 for LLMs

> we care about compute FLOPS, memory bandwidth,
> VRAM, L2 cache, interconnect, and power/TDP

> memory bandwidth ≅ tokens/sec, or how fast you can move tokens/KV-cache (decode ceiling)
> 5090 and both PRO 6000s bring 1.8 TB/s GDDR7 lanes (nearly double 4090/3090)
> so expect up to 2x decode throughput when bandwidth is your wall

> L2 cache ≅ how quickly you hit L2 vs get stuck on DRAM (latency killer)
> RTX 3090, Ampere, comes with very small L2 (6 MB) while
> Ada/Blackwell’s massive L2 (72 MB 4090 → 96 MB 5090 → 128 MB PRO 6000)
> which slashes DRAM trips, keeps KV cache on-die, making it
> way faster for small-batch/greedy/agent chain workloads
> and big boosts for streaming/latency-sensitive inference

> compute FLOPS (FP16/BF16/FP8) ≅ how fast you can run matmuls (prefill/training speed)
> 5090 comes with 419 TFLOPS FP16 (838 FP8)
> while the PRO 6000’s variations come with
> 35% and 10% higher TFLOPS over the 5090
> PRO Workstation: 125 TFLOPS FP32 (≈504 FP16, ≈1.008 PFLOPS FP8)
> PRO Max-Q: 110 TFLOPS FP32 (≈439 FP16, ≈878 FP8)

> VRAM ≅ how much model + KV + batch fits on-GPU
> 96 GB VRAM on both PRO 6000s means KV-cache offload is easy
> multi-user 70B (FP4) chat, 32k+ context, and
> real throughput for big-batch or fine-tune

> interconnect = whether multi-GPU setups actually scale (NVLink vs PCIe), and how fast
> and NVLink is dead, except on 3090
> only the 3090 can do true dual-GPU mesh (112 GB/s NVLink)
> all others are PCIe-bound, and throttled in comparison to datacenter/HGX

> power/TDP ≅ whether you can sustain clocks, or just spike (PSU fried) or/and throttle
> 600W (PRO 6000 WS) and 575W (5090) means
> bring 1.5–2 kW PSU, 20A circuit, and lots of airflow
> Max-Q, at 96 GB, requires 300W but gets lower TFLOPS
> (making a dense 4-GPU boxes doable)

quick recap of the GPUs specs

RTX 3090 (Ampere GA102)
> 24 GB GDDR6X
> 936 GB/s bandwidth
> 6 MB L2 cache
> 142 TFLOPS FP16 tensor (285 sparse)
> PCIe 4.0 x16 + NVLink (112 GB/s, dual card only, the only consumer card with it)
> 350W TDP

RTX 4090 (Ada AD102)
> 24 GB GDDR6X
> 1.008 TB/s bandwidth
> 72 MB L2 cache
> 330 TFLOPS FP16 tensor
> 660 TFLOPS FP8 tensor
> PCIe 4.0 x16, no NVLink
> 450W TDP

RTX 5090 (Blackwell GB202 cut)
> 32 GB GDDR7
> 1.792 TB/s bandwidth
> 96 MB L2 cache (98 MB physical address space)
> 419 TFLOPS FP16 tensor
> 838 TFLOPS FP8 tensor
> PCIe 5.0 x16, no NVLink
> 575W TDP

RTX PRO 6000 (Workstation Edition, Blackwell)
> 96 GB GDDR7 (ECC)
> 1.792 TB/s bandwidth
> 128 MB L2 cache (full GB202 die)
> 125 TFLOPS FP32 → ≈504 FP16 → ≈1.008 PFLOPS FP8
> PCIe 5.0 x16
> 600W TDP

RTX PRO 6000 (Max-Q, Blackwell)
> 96 GB GDDR7 (ECC)
> 1.792 TB/s bandwidth
> 128 MB L2 cache
> 110 TFLOPS FP32 → ≈439 FP16 → ≈878 FP8
> PCIe 5.0 x16
> 300W TDP

so, what does all of that mean?

> 3090 with NVLink is the only consumer route to
> high-speed dual-GPU sync without going server/HGX
> but that comes with very small L2 cache, lower TFLOPS,
> and only 24 GB GDDR6X at 936 GB/s bandwidth per GPU

> 5090 (1.8 TB/s bandwidth + 96 MB L2) is for those who want
> sub-$2k single-GPU throughput, tokens/sec king, L2 cache monster

> PRO 6000 (WS or Max-Q) is for serving everything, 96 GB VRAM unlocks
> BIG SIZED models, with full context and no CPU offloading

for LLM inference
> 5090: king of single-GPU decode speed (1.8 TB/s bandwidth + 96 MB L2)
> PRO 6000 (WS/Max-Q): same bandwidth/L2, but 4x the VRAM; pick 600W (Workstation) or 300W (Max-Q)
> 3090: budget dual-GPU with NVLink, still wins for parallel inference if you need interconnect
> 4090: still a killer in terms of L2 cache, but 24 GB VRAM limits context/batch for larger models, and no NVLink

for LLM training
> 5090: best raw compute for $ spent + bandwidth + 32 GB VRAM
> PRO 6000: massive VRAM (96 GB) removes batch/context bottleneck
> Max-Q: “fit four in a box” special 96 GB at 300W per card, dense workstations are back
> dual 3090 (NVLink): still out-syncs PCIe-only pairs in some parallel/fine-tune jobs

remember
> decode = bandwidth (tokens/sec) + L2 (latency)
> prefill/training = compute FLOPS + VRAM
> scale up = interconnect (NVLink on 3090, else PCIe Gen5)

more on the Buy a GPU website later this week :)

409

408

69K

MattTonsager retweeted

BOOTOSHI 👑

@KingBootoshi

about 1 month ago

i’m starting to really favor simple, eval driven prompts, especially for codex 5.5 “”” this is the desired end result. {goal} these conditions must be true in order to pass {conditions} “”” thats it. define a goal and a passing condition. that’s it. it’s so easy now.

170

135

MattTonsager retweeted

Alex Finn

@AlexFinn

about 1 month ago

We got 100 years on this planet then you're dead forever Yet you're too scared Too scared to talk to that girl To quit your job and do what you love To start that business To learn that new skill To start that YT channel To apply for that dream position To DM that one person 100 years and you're going to spend it being a scared little kitty cat Don't be that stone cold loser who lays on their death bed filled with regret

594

176

27K

MattTonsager retweeted

Alex Finn

@AlexFinn

about 1 month ago

The most dangerous thing you can do right now is NOT use the latest AI tools. Period. Every day a new company is laying off thousands of people who don't know how to use the most modern AI tools If I were in the 9-5 world right now, this is every step I'd take: 1. Download Codex and build your first app. Learn how to implement a front end and database. AI can teach you all of this 2. Download OpenClaw or Hermes agent. Tell the agent about your entire life. Career, goals, and ambitions. Ask it what workflows it can implement to get you closer to those goals 3. Get Claude Design. Keep an eye out on X for visual language and designs you like. Feed this inspiration into Claude Design and get comfortable designing beautiful interfaces 4. Get the ChatGPT Pro plan. Feed GPT 5.5 Pro your hardest problems. Burn as many tokens as humanly possible with this model 5. Constantly look at your limits in all your AI plans. If you're ever above 50% on your limits, get angry that you're not burning enough tokens. 6. Learn how to use Claude Code side by side with Codex. Learn both their strengths. These are the only 2 coding tools that matter. Master these and you're golden If you do these 6 things you are in excellent position to not only be safe in your career, but also dominate those that don't pick up these skills.

143

170

142K

MattTonsager retweeted

Сarm1ne

@carm1nee

about 1 month ago

Paul Tudor Jones predicted the 1987 crash, made $100 million, then spent years trying to destroy this footage you will watch him lose $6 million in one afternoon, sit in his chair and say "total devastation" then make it all back with 100% interest This documentary will change how you think about risk forever Bookmark & watch it. Then read the post below - $90 billion from being right just 54% of the time↓

133

14K

31K

MattTonsager retweeted

Ahmad

@TheAhmadOsman

about 1 month ago

Let me make local AI easy for you Give Codex Cli the tweet below & tell it: - Infer the right Inference Engine from your hardware + tweet content below - Use uv+venv - Pick the right kernels - Tune flags, batching, KVCache, etc - Optimize for your hardware & chosen model Enjoy

343

519

31K

MattTonsager retweeted

Alex Finn

@AlexFinn

about 1 month ago

Pretty incredible You have to try the new '/goal' feature in Codex It worked for over an hour and built me an entire complex extraction shooter video game You give it a goal, then it works endlessly until the goal is complete. It's like a Ralph loop. Can run for days If you enable the image gen skill before you run the goal, it will even generate ALL the assets for your game autonomously. I didn't manually create ANY of the assets you see in the video Recommendations: enable the image gen skill, put on skip all permissions, and give the prompt as much detail as you can. It will accomplish ALL of it This has to be the sickest way to build games/ long running app tasks ever

153

171

301K

MattTonsager retweeted

Shruti

@heyshrutimishra

about 1 month ago

Ex-Tesla President just revealed Elon’s decision that changed everything... Elon looked at their struggling online sales and asked one question: "How many clicks does it take to buy a Domino's pizza?" They pulled it up. 10 taps. Tesla was at 64 clicks to buy a $120,000 car. Elon's response: "We are 64. Domino's is 10. Let's go to 10." Then they ran the data on their 360,000 car configurations. Customers were only buying two. They cut it to two. Sales exploded. The most expensive product in the room was the hardest to buy. One comparison changed that forever.

441

377K

MattTonsager retweeted

Shruti

@heyshrutimishra

about 1 month ago

You're using OpenClaw wrong if it's still one chat window. Anyone running everything in one chat knows the feeling. Nothing runs in parallel. Code waits on research, research waits on ops, ops waits on whatever you started yesterday. And every topic switch contaminates the next. Telegram supergroup topics fix this. Each topic is a separate conversation, and the agent treats each one as its own context. Give each topic a job (Code, Research, Ops, Content), point OpenClaw at the group, and you've got what's basically four agents running in parallel that never talk to each other. Setup takes an afternoon. Here's how: Step 1. Install clawddocs first. openclaw skills install clawddocs This pulls 200+ pages of OpenClaw docs into the agent's context. Without it, every config question becomes a guess. Step 2. Create a Telegram supergroup. In Group Settings, turn on Topics. Each topic is a fully separate conversation. The agent doesn't carry context between them, which is the whole reason this works. It's not multitasking, it's hard isolation. Step 3. Name the topics after jobs, not the agent. Code, Research, Ops, Content. Whatever your actual workflow is. Step 4. Add the bot to the group. Make it admin and give it the Manage Topics permission. Then open [@]BotFather, run /setprivacy, pick your bot, and choose Disable. Without that, the bot only reads messages that start with /, which means it ignores almost everything you type in the topics. Step 5. Open a chat with the bot and tell it to find every group it's been added to, then update openclaw.json with that list. The bot pulls the list from Telegram itself. From then on, OpenClaw sees every group and every topic inside. Step 6. That's it. Open whichever topic you need, ask the question, get an answer that isn't contaminated by the other three lanes.

heyshrutimishra's tweet photo. You're using OpenClaw wrong if it's still one chat window.

Anyone running everything in one chat knows the feeling. Nothing runs in parallel. Code waits on research, research waits on ops, ops waits on whatever you started yesterday. And every topic switch contaminates the next.

Telegram supergroup topics fix this. Each topic is a separate conversation, and the agent treats each one as its own context. Give each topic a job (Code, Research, Ops, Content), point OpenClaw at the group, and you've got what's basically four agents running in parallel that never talk to each other.

Setup takes an afternoon. Here's how:

Step 1. Install clawddocs first.

openclaw skills install clawddocs

This pulls 200+ pages of OpenClaw docs into the agent's context. Without it, every config question becomes a guess.

Step 2. Create a Telegram supergroup. In Group Settings, turn on Topics.

Each topic is a fully separate conversation. The agent doesn't carry context between them, which is the whole reason this works. It's not multitasking, it's hard isolation.

Step 3. Name the topics after jobs, not the agent. Code, Research, Ops, Content. Whatever your actual workflow is.

Step 4. Add the bot to the group. Make it admin and give it the Manage Topics permission.

Then open [@]BotFather, run /setprivacy, pick your bot, and choose Disable. Without that, the bot only reads messages that start with /, which means it ignores almost everything you type in the topics.

Step 5. Open a chat with the bot and tell it to find every group it's been added to, then update openclaw.json with that list. The bot pulls the list from Telegram itself. From then on, OpenClaw sees every group and every topic inside.

Step 6. That's it. Open whichever topic you need, ask the question, get an answer that isn't contaminated by the other three lanes.

253

498

25K

MattTonsager retweeted

Shruti

@heyshrutimishra

about 1 month ago

you can finally create brand kit with GPT Image 2 from a single logo to a complete agency-style brand guideline board. with the right prompt sequence, you can go from a single visual reference to a complete agency-style brand guideline board that's 80–90% presentation-ready. this SpaceX spec kit was built with: - 1 logo/reference direction - A sequence of prompts to expand and refine it color systems, typography, UI blocks, icons, patterns, packaging, mobile screens, social posts, business cards, billboard mockups — all in one visual direction. artifacts still show up in tiny text, but they're quick manual fixes. zero to a full brand board: ~1 hour. 5–10 brand directions in a day. want the prompts? Drop a comment ... happy to share the full breakdown.

190

240

17K

mtonsager

@MattTonsager

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users