TooLazyToPrompt

28 days ago

This MTP pull request merge is getting more attention than many model drops. I first noticed MTP while looking at Qwen3.5-0.8B, and now llama.cpp support makes the whole thing more interesting. My current understanding is that MTP mainly improves token generation, not prompt processing. So it helps when the model is writing a lot: chat, coding, long answers, agents, synthetic data, local assistants. But if the workload is mostly huge prompt + short answer, then prompt processing is still the bottleneck. People are mentioning around 1.5x to 1.8x faster token generation in some setups. My question is: how useful is this overall in real local AI workflows? Is MTP going to matter mainly for long generation and agent loops, or will it become a default feature people expect in models?

thefirehacker's tweet photo. This MTP pull request merge is getting more attention than many model drops.

I first noticed MTP while looking at Qwen3.5-0.8B, and now llama.cpp support makes the whole thing more interesting.

My current understanding is that MTP mainly improves token generation, not prompt processing.

So it helps when the model is writing a lot:

chat, coding, long answers, agents, synthetic data, local assistants.

But if the workload is mostly huge prompt + short answer, then prompt processing is still the bottleneck.

People are mentioning around 1.5x to 1.8x faster token generation in some setups.

My question is: how useful is this overall in real local AI workflows?

Is MTP going to matter mainly for long generation and agent loops, or will it become a default feature people expect in models?

0

3

77

shadow_prompt retweeted

27 days ago

New UI Preview feature on Claude Code is really great. I gave it a screenshot and asked it to make a navbar prettier. Instead of immediately editing CSS, it first asked me to choose a direction: Refined gold pill Sparkle prefix Glow halo around text That is the part I found useful. For frontend work, “make it prettier” is not a coding instruction. It is a taste decision. Claude Code did not jump straight from prompt to diff. It stopped at the subjective layer first. The flow felt like: visual context → design options → human choice → code edit All in a single clean flow.

thefirehacker's tweet photo. New UI Preview feature on Claude Code is really great.

I gave it a screenshot and asked it to make a navbar prettier.
Instead of immediately editing CSS, it first asked me to choose a direction:
Refined gold pill
Sparkle prefix
Glow halo around text
That is the part I found useful.
For frontend work, “make it prettier” is not a coding instruction. It is a taste decision.
Claude Code did not jump straight from prompt to diff. It stopped at the subjective layer first.
The flow felt like:
visual context → design options → human choice → code edit All in a single clean flow.

0

6

4

3

55

shadow_prompt retweeted

about 1 month ago

This is one of the most crucial lessons in First Break AI. It teaches you how to navigate @huggingface like a pro. Not just: download model → run notebook → move on In this lesson, we go deeper. We look at how open model repos are structured, how to read model files, how config.json connects to the actual model class, and how to trace from a Hugging Face model page into the Transformers code that runs the model. We use Qwen3-0.6B as the learning model. We also look at why Markdown matters so much in AI workflows: model cards, GitHub issues, README files, Discord, Cursor, Claude Code, planning docs, and AI-assisted work. Then comes the biggest win: datasets. Working with datasets is a core AI engineering skill. I show 3 ways to analyze datasets on Hugging Face: Croissant endpoint Data Studio / browser viewer load_dataset with Python, pandas, and plots We inspect dataset structure, categories, response lengths, distribution, short examples, long examples, and how to think about dataset quality before using it for training or fine-tuning. And this sets up the next part: running Qwen3 directly in C, without treating Transformers as magic. Lesson 01: Hugging Face Beyond Upload Watch: https://t.co/GF8ZCNk5WN Free cohort: https://t.co/0H4qIVOpGj

0

10

5

3

273

shadow_prompt retweeted

about 1 month ago

“Once you start learning about stuff , the density of accessible information increases in an extremely literal sense: You are able to engage with more of the world than you were before, even though the amount of physical world around you has not changed” - Quote from “How I read ” This is a great way to put into words my reaction when I discover a new perspective. One word in Mumbai slang describes this perfectly: “Aaila.” Link to the Original post: https://t.co/S37sXaXobC

1

5

3

42

shadow_prompt retweeted

about 1 month ago

- Gothically gorgeous - Hauntingly beautiful

0

4

3

56

about 1 month ago

RT @thefirehacker: First Break AI https://t.co/DN9Paa5GJw Cohort: 1 May 2026 — 30 June 2026 (2 months)

0

1

0

shadow_prompt retweeted

about 1 month ago

👇Epic prompt for learning , create clean Japanese style posters. Use claude to create 3d mockups instantly. Mockups could be better looking for a way to generate better ones. Interestingly on Qwen 3.5 0.8B I came across an MTP ( Multi token prediction side branch) on the mockup. This is used for speculative decoding in models and apparently other models are shipping this too ( deepseek , GLM - need to check). NORMAL PATH tokens / vision tokens ↓ 24 decoder layers ↓ RMSNorm ↓ tied LM head ↓ predict next token t+1 MTP SIDE PATH main hidden state + token embedding ↓ fusion projection: fc.weight [1024, 2048] ↓ one small decoder-like layer: mtp.layers.0 ↓ mtp norm ↓ same / tied vocab projection ↓ draft token t+2 / t+3 ... I will include this topic in our cohort as well https://t.co/xLmTIU0rq6

0

4

2

205

shadow_prompt retweeted

about 1 month ago

Only 5 hour limits are doubled which means weekly limits will hit even faster 👽🤣

0

6

4

3

79

about 1 month ago

0

1

0

10

shadow_prompt retweeted

about 1 month ago

Cost/Perf tradeoffs & Evals are the most requested topics for this cohort. I was not expecting these to make top 3. Real life signals are always different from my assumptions.

1

6

5

4

73

shadow_prompt retweeted

about 1 month ago

Moved my site to a custom domain. Google traffic vanished overnight. Google had picked my old *.pages.dev URL as the canonical and was treating my real domain as a copy. 20-minute fix with Claude + 1 git command nobody talks about.

2

6

4

90

shadow_prompt retweeted

about 2 months ago

I found this feature in cursor today "Generate commit message" , very useful. don't have prompt CC or Cursor " generate commit msg for staged code"

thefirehacker's tweet photo. I found this feature in cursor today "Generate commit message" , very useful.

don't have prompt CC or Cursor " generate commit msg for staged code" https://t.co/EGPrsvFnHc

3

4

3

39

about 2 months ago

@thefirehacker Great find!

0

1

0

18

about 2 months ago

Brilliant way to host your training’s or even start learning something new!

about 2 months ago

We need more real LLM training case studies. If you’ve seen or shared actual training runs, drop them here 👇 💡 We’re collecting these and turning them into structured breakdowns that are easy to learn and apply. Reading the Curves: How real LLMs learn, spike, recover, and stabilize. 👉 https://t.co/uovuNxS0le The Marin pretraining run is now part of our pre-cohort blog + office hours. During the cohort, we’ll keep collecting and publishing real case studies across open-source models and experiments. Cohort: 1 May — 30 June (2 months)

thefirehacker's tweet photo. We need more real LLM training case studies.
If you’ve seen or shared actual training runs, drop them here 👇
💡 We’re collecting these and turning them into structured breakdowns that are easy to learn and apply.

Reading the Curves:
How real LLMs learn, spike, recover, and stabilize.

👉 https://t.co/uovuNxS0le
The Marin pretraining run is now part of our pre-cohort blog + office hours.

During the cohort, we’ll keep collecting and publishing real case studies across open-source models and experiments.

Cohort: 1 May — 30 June (2 months)

1

5

2

200

0

1

0

52

about 2 months ago

Brilliant platform to host your training or simply start learning.

about 2 months ago

We need more real LLM training case studies. If you’ve seen or shared actual training runs, drop them here 👇 💡 We’re collecting these and turning them into structured breakdowns that are easy to learn and apply. Reading the Curves: How real LLMs learn, spike, recover, and stabilize. 👉 https://t.co/uovuNxS0le The Marin pretraining run is now part of our pre-cohort blog + office hours. During the cohort, we’ll keep collecting and publishing real case studies across open-source models and experiments. Cohort: 1 May — 30 June (2 months)

1

5

2

200

0

3

0

37

shadow_prompt retweeted

about 2 months ago

First Break AI Your first break in AI — a guided journey from first commit to capstone Free, open cohort to upskill in training, inference, and AI product building. https://t.co/xLmTIU0rq6 Easy to follow Roadmap & AI Podcast guided journey are up. Weekly office hours (Friday) Join Discord Server: https://t.co/ailadSswLY

2

9

4

242

shadow_prompt retweeted