FireHacker

Verified account

@thefirehacker

Founder @ AIEDX. Building — AI agent traffic intelligence. First Break AI: LLM training, inference & AI product.

Mumbai

Joined April 2011

431 Following

499 Followers

7.9K Posts

Pinned Tweet

about 1 month ago

This is one of the most crucial lessons in First Break AI. It teaches you how to navigate @huggingface like a pro. Not just: download model → run notebook → move on In this lesson, we go deeper. We look at how open model repos are structured, how to read model files, how config.json connects to the actual model class, and how to trace from a Hugging Face model page into the Transformers code that runs the model. We use Qwen3-0.6B as the learning model. We also look at why Markdown matters so much in AI workflows: model cards, GitHub issues, README files, Discord, Cursor, Claude Code, planning docs, and AI-assisted work. Then comes the biggest win: datasets. Working with datasets is a core AI engineering skill. I show 3 ways to analyze datasets on Hugging Face: Croissant endpoint Data Studio / browser viewer load_dataset with Python, pandas, and plots We inspect dataset structure, categories, response lengths, distribution, short examples, long examples, and how to think about dataset quality before using it for training or fine-tuning. And this sets up the next part: running Qwen3 directly in C, without treating Transformers as magic. Lesson 01: Hugging Face Beyond Upload Watch: https://t.co/GF8ZCNk5WN Free cohort: https://t.co/0H4qIVOpGj

0

10

5

3

273

2 days ago

Is there a prompt guide for Fable?. Fable uses most of the quota in just few prompts and still feels nerfed. I tried to use fable for a serious task like product analysis. It gave sharp analysis however looks like model is shy about tool calls. It doesn't want to collect a lot of information and I had to push it hard to do real analysis. Analysis overall is sharper than Opus , however this feels like a nerfed model . Is there prompt guide or direction how to use this model effectively?

0

2

1

1

83

thefirehacker retweeted

3 days ago

@karpathy This is not a day for celebrating, Andrej. It's a very dark and very sad day, and the damage may be impossible to undo.

105

4K

244

603

373K

thefirehacker retweeted

3 days ago

Labs starting to pull up the ladders on the ability to diffuse AI was inevitable. Doing it without telling the user is misaligned.

57

2K

188

297

286K

Who to follow

Sabyasachi Sahoo

ML Reasoning, Efficiency, Generalization | PhD Student @Mila_Quebec @IID_ULaval @InriaMaasai | @ServiceNowRSRCH intern | ex-@amazon @nvidia @iiscbangalore

BubblSpace: Create bubbl for your Open AI Agents. Build , customise , collaborate., innovate using Open AI agents called bubbl

Dr S. Sagar Srinivas

@SagarSrinivasS1

Senior Scientist, Tata Research Development and Design Center (TRDDC)

3 days ago

I built a small visualization layer on top of a local Qwen3 in Pure C to understand LLM output Image shows why sampling is not greedy decoding: a lower-probability token can still get selected when temperature/top-p keep it inside the candidate pool. I would also love feedback on what would make a visualization like this more useful for learning: - KV cache view? - attention heatmaps? - speculative decoding comparison? - greedy vs top-p side-by-side?

thefirehacker's tweet photo. I built a small visualization layer on top of a local Qwen3 in Pure C to understand LLM output

Image shows why sampling is not greedy decoding: a lower-probability token can still get selected when temperature/top-p keep it inside the candidate pool.
I would also love feedback on what would make a visualization like this more useful for learning:

- KV cache view?
- attention heatmaps?
- speculative decoding comparison?
- greedy vs top-p side-by-side?

1

4

1

1

105

thefirehacker retweeted

4 days ago

have been recently thinking about why pretrain research matters among the seemingly more crucial data/compute/rl bottlenecks and sharing my take here on what makes pretrain research (still!) vital: 1. better computational efficiency: scalinglaw shifts, 2x less FLOPS needed to achieve the same loss, etc. plus e.g. long context settings where switching to hybrid or sparse attn can save you >90% FLOPS. many model arch / optimizer improvements can save you >20% flops needed for the same loss - those are research innovations on every axis from training iter dimension to inter-layer and intra-layer. the effect of compounded architecture advantage is very distinctive given that ur always improving against your sota baseline. good pretrain research might very well have already delivered you a 10x more efficient (and likewise, better under the same compute) model arch compared to three years ago, and there's still obv many inefficiencies left to be optimized. over half of the compute is still spent on pretraining when you do new from-scratch model trainings rn, and having weeks & months saved there could really allow much more rapid iterations across the entire stack, compounded. 2. to train models one couldn't have been able to previously: residuals, optimizers, etc. this one's less common since most of the arch innovations don't offer more beyond the expressivity gain. but there are significant ones which can e.g. provide more stable learning dynamics (both theoretically and in practice) at all scales so one could scale up. new model configs or forms of training also come back to better efficiency data/compute/FLOPS bottlenecks certainly exist but are relatively more orthogonal to pretrain research and imo it is unclear whether one will be a clear intelligence bottleneck a year from now than the other. in hindsight ive been using "pretrain research" tho this itself is an inefficiency (with further inefficiencies under its scaling law) and "deep learning research" is a better phrasing.

3

265

18

156

31K

thefirehacker retweeted

4 days ago

@_arohan_ My response to the claim that Muon is a renamed version of Shampoo https://t.co/Qlwo34C4t4

0

67

2

19

7K

thefirehacker retweeted

5 days ago

one of my favorite projects is Marin from the stanford folks, they have a scientific approach to training, are ready to take risks and are fully open (even open development where you can follow everything on github!) https://t.co/G12JfPlFJP

eliebakouch's tweet photo. one of my favorite projects is Marin from the stanford folks, they have a scientific approach to training, are ready to take risks and are fully open (even open development where you can follow everything on github!)

https://t.co/G12JfPlFJP https://t.co/pQYgKgtGNG

5

282

13

146

33K

thefirehacker retweeted

Sebastian Raschka

5 days ago

https://t.co/L4s34f28oX

51

1K

125

1K

135K

7 days ago

Claude Cowork with blender is so much fun, still work in progress will post the final scene soon. Trying out if it can build basic geometry nodes scene like waves hitting a beach 🌊🏖️

thefirehacker's tweet photo. Claude Cowork with blender is so much fun, still work in progress will post the final scene soon.

Trying out if it can build basic geometry nodes scene like waves hitting a beach 🌊🏖️ https://t.co/jm2T9pUBnI

1

4

3

3

63

thefirehacker retweeted

9 days ago

We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang

natolambert's tweet photo. We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang https://t.co/TReCaCwXDL

18

687

59

387

55K

thefirehacker retweeted

Fetchlens.ai @fetchlens

9 days ago

52% of MCP servers are dead within 90 days. But the median server has 6 commits — lifetime. The protocol works. The logic layer doesn't exist. Content goes stale. Tools stay isolated. Nobody monitors what fails. Full research: https://t.co/xCk7HPZbce

fetchlens's tweet photo. 52% of MCP servers are dead within 90 days.
But the median server has 6 commits — lifetime.

The protocol works. The logic layer doesn't exist.

Content goes stale. Tools stay isolated. Nobody monitors what fails.

Full research: https://t.co/xCk7HPZbce https://t.co/lUyKvLUQQa

0

4

4

3

64

thefirehacker retweeted

Aditya Raj Kaul

10 days ago

Extremely Rare Red Sprites Spotted Flashing Over Tibet. They are caused by high levels of electrical activity and form in the upper atmosphere during powerful thunderstorms.

208

24K

4K

3K

581K

thefirehacker retweeted

Matej Sirovatka

10 days ago

KV Cache re-use is the most important thing for agentic rollouts. We've integrated Mooncake Store into prime-rl with vLLM, you can now use it as a drop-in replacement for native CPU/Disk offloading, giving you cross-node prefix cache reuse to make your agents go brrr🚀

14

337

25

137

31K

thefirehacker retweeted

10 days ago

WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing

eliebakouch's tweet photo. WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing https://t.co/VWpxB7VdOb

24

987

120

680

199K

thefirehacker retweeted

11 days ago

🎯 Andrej Karpathy on how to learn.

rohanpaul_ai's tweet photo. 🎯 Andrej Karpathy on how to learn. https://t.co/nMfLYGAfME

51

6K

747

4K

207K

thefirehacker retweeted

15 days ago

Fantastic. Mostly 𝐬𝐢𝐧(𝐱). Name it... Fireeel? Remix in #Wolfram Mathematica. Full code below. x = Range[0., 9999]; k = 4 Cos[x/21]; e = x/1880 - 20; d = Sqrt[k^2 + e^2]; m = UnitStep[k^2 - 15]; Manipulate[With[{ q = 3 Sin[2 k]+.3/k+k*Sin[x/4465](9+2*Sin[14*e-3*d+2*t])}, Graphics[{ Blend[{White, Red}, Sin[t]^2], Opacity[.5], PointSize[.01], Point@Pick[#, m, 1], White, Opacity[.75], PointSize[.0025], Point@Pick[#, m, 0]}&@ Transpose@{q+50*Cos[d-t]+200, 875-q*Sin[d-t]-39*d}, PlotRange -> {{100, 300}, {75, 320}}, Background -> Black]], {t, 0, 2 Pi}]

7

456

60

197

19K

thefirehacker retweeted

16 days ago

Indian scientists just made history. Researchers from IIT Madras and IISc Bengaluru just pulled off something impossible. They've created the world's "first carbon-free ferrocene". This means we can finally build the next generation of incredibly durable tech. Let me explain. See, ferrocene is this wild organometallic molecule - where an iron atom is perfectly sandwiched between two carbon rings. But it’s insanely stable. Which is why it is already used in rocket fuels, car gasoline additives, long-life batteries, and even cancer medicines. And for the last 75 years, everyone thought it was impossible to build the same stable structure without using carbon. But this team of Indian scientists proved everyone wrong. They created the same perfect sandwich structure - by swapping iron for osmium and carbon rings for boron rings. And what they got was the world's first carbon-free ferrocene - which is so much stronger than the carbon bonds. By doing so - they've opened up a whole new era of chemistry. And we have no idea how many amazing things we might discover. But to think all of this started in India is truly amazing. Kudos to everyone on this team: Sundargopal Ghosh, Stutee Mohapatra, Suvam Saha, Urvashi Gupta, Deepak Patel - from IIT Madras, Gaurav Joshi and Eluvathingal D. Jemmis - from IISc Bengaluru.

iamvarunguru's tweet photo. Indian scientists just made history.

Researchers from IIT Madras and IISc Bengaluru just pulled off something impossible.

They've created the world's "first carbon-free ferrocene".

This means we can finally build the next generation of incredibly durable tech.

Let me explain.

See, ferrocene is this wild organometallic molecule - where an iron atom is perfectly sandwiched between two carbon rings.

But it’s insanely stable.

Which is why it is already used in rocket fuels, car gasoline additives, long-life batteries, and even cancer medicines.

And for the last 75 years, everyone thought it was impossible to build the same stable structure without using carbon.

But this team of Indian scientists proved everyone wrong.

They created the same perfect sandwich structure - by swapping iron for osmium and carbon rings for boron rings.

And what they got was the world's first carbon-free ferrocene - which is so much stronger than the carbon bonds.

By doing so - they've opened up a whole new era of chemistry. And we have no idea how many amazing things we might discover.

But to think all of this started in India is truly amazing.

Kudos to everyone on this team: Sundargopal Ghosh, Stutee Mohapatra, Suvam Saha, Urvashi Gupta, Deepak Patel - from IIT Madras, Gaurav Joshi and Eluvathingal D. Jemmis - from IISc Bengaluru.

iamvarunguru's tweet photo. Indian scientists just made history.

Researchers from IIT Madras and IISc Bengaluru just pulled off something impossible.

They've created the world's "first carbon-free ferrocene".

This means we can finally build the next generation of incredibly durable tech.

Let me explain.

See, ferrocene is this wild organometallic molecule - where an iron atom is perfectly sandwiched between two carbon rings.

But it’s insanely stable.

Which is why it is already used in rocket fuels, car gasoline additives, long-life batteries, and even cancer medicines.

And for the last 75 years, everyone thought it was impossible to build the same stable structure without using carbon.

But this team of Indian scientists proved everyone wrong.

They created the same perfect sandwich structure - by swapping iron for osmium and carbon rings for boron rings.

And what they got was the world's first carbon-free ferrocene - which is so much stronger than the carbon bonds.

By doing so - they've opened up a whole new era of chemistry. And we have no idea how many amazing things we might discover.

But to think all of this started in India is truly amazing.

Kudos to everyone on this team: Sundargopal Ghosh, Stutee Mohapatra, Suvam Saha, Urvashi Gupta, Deepak Patel - from IIT Madras, Gaurav Joshi and Eluvathingal D. Jemmis - from IISc Bengaluru.

iamvarunguru's tweet photo. Indian scientists just made history.

Researchers from IIT Madras and IISc Bengaluru just pulled off something impossible.

They've created the world's "first carbon-free ferrocene".

This means we can finally build the next generation of incredibly durable tech.

Let me explain.

See, ferrocene is this wild organometallic molecule - where an iron atom is perfectly sandwiched between two carbon rings.

But it’s insanely stable.

Which is why it is already used in rocket fuels, car gasoline additives, long-life batteries, and even cancer medicines.

And for the last 75 years, everyone thought it was impossible to build the same stable structure without using carbon.

But this team of Indian scientists proved everyone wrong.

They created the same perfect sandwich structure - by swapping iron for osmium and carbon rings for boron rings.

And what they got was the world's first carbon-free ferrocene - which is so much stronger than the carbon bonds.

By doing so - they've opened up a whole new era of chemistry. And we have no idea how many amazing things we might discover.

But to think all of this started in India is truly amazing.

Kudos to everyone on this team: Sundargopal Ghosh, Stutee Mohapatra, Suvam Saha, Urvashi Gupta, Deepak Patel - from IIT Madras, Gaurav Joshi and Eluvathingal D. Jemmis - from IISc Bengaluru.

83

6K

2K

795

171K

thefirehacker retweeted

17 days ago

new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA) main changes: - based on GQA not MLA - block level selection like in CSA but attention is done on the real KV, not in the compressed dimension

eliebakouch's tweet photo. new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA)

main changes:
- based on GQA not MLA
- block level selection like in CSA but attention is done on the real KV, not in the compressed dimension https://t.co/Uga4Jgv1xF

23

702

70

315

128K

thefirehacker retweeted

19 days ago

New blog! Covers a lot of papers and methods about recent advances in On policy distillation and On policy self distillation, their wins, their failure modes, and my opinion about the same! Link below, please do check it out, and RT/QT if you like it:)

ChinmayKak's tweet photo. New blog!
Covers a lot of papers and methods about recent advances in On policy distillation and On policy self distillation, their wins, their failure modes, and my opinion about the same!
Link below, please do check it out, and RT/QT if you like it:) https://t.co/UoTivfW3u4

24

513

69

552

70K

thefirehacker retweeted

Sebastian Raschka

20 days ago

Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code: https://t.co/o2PMhjF0TN

rasbt's tweet photo. Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib.
With motivation, overview, and GPT-style model reference implementation as standalone example code: https://t.co/o2PMhjF0TN https://t.co/jjKyt3aPcR

44

2K

242

1K

74K

Last Seen Users on Sotwe

Trends for you

Most Popular Users