Rohan @Beast27Mercy - Twitter Profile

Rohan @Beast27Mercy

2 days ago

@MikaOSRS Love your positivity and big brain breach strats. Can't wait for next year!

0

1

0

411

Beast27Mercy retweeted

Tech with Mak

@techNmak

about 1 month ago

This math sits underneath every AI model being trained right now. Gradient. Jacobian. Hessian. Three words that look intimidating at first. But they are really just three ways of measuring change. 𝟭. 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 ∇f Takes a scalar function: f : ℝⁿ → ℝ Returns a vector of first-order partial derivatives. It answers: "Which direction makes f increase fastest?" That is why gradients are central to optimization. Gradient descent moves in the opposite direction because the gradient points uphill. Backpropagation efficiently computes gradients during training. 𝟮. 𝗝𝗮𝗰𝗼𝗯𝗶𝗮𝗻 J_F Takes a vector-valued function: F : ℝⁿ → ℝᵐ Returns an m × n matrix of first-order partial derivatives. It answers: "How does each output change with each input?" The Jacobian is the local linear map of a vector-valued function. It shows up in: → sensitivity analysis → change of variables → automatic differentiation → forward-mode AD → reverse-mode AD / backpropagation In simple terms: forward-mode AD uses Jacobian-vector products. reverse-mode AD uses vector-Jacobian products. 𝟯. 𝗛𝗲𝘀𝘀𝗶𝗮𝗻 H_f Takes a scalar function: f : ℝⁿ → ℝ Returns an n × n matrix of second-order partial derivatives. It answers: "How does the gradient itself change?" That means the Hessian measures curvature. When the second partial derivatives are continuous, the Hessian is symmetric. At a critical point: → positive definite Hessian → strict local minimum → negative definite Hessian → strict local maximum → indefinite Hessian → saddle point The clean mental model Gradient = first derivatives of one output → tells you direction Jacobian = first derivatives of many outputs → tells you sensitivity Hessian = second derivatives of one output → tells you curvature And the relationship between them is simple: The Hessian is the Jacobian of the gradient. For a scalar output, the Jacobian contains the same partial derivatives as the gradient, up to row/column convention. Same idea: measure change. Different object: direction, sensitivity, curvature. Once this clicks, optimization stops looking like a pile of formulas. It starts looking like a map of the problem.

techNmak's tweet photo. This math sits underneath every AI model being trained right now.

Gradient. Jacobian. Hessian.

Three words that look intimidating at first.

But they are really just three ways of measuring change.

𝟭. 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 ∇f
Takes a scalar function:

f : ℝⁿ → ℝ

Returns a vector of first-order partial derivatives.

It answers:

"Which direction makes f increase fastest?"

That is why gradients are central to optimization.

Gradient descent moves in the opposite direction because the gradient points uphill.

Backpropagation efficiently computes gradients during training.

𝟮. 𝗝𝗮𝗰𝗼𝗯𝗶𝗮𝗻 J_F
Takes a vector-valued function:

F : ℝⁿ → ℝᵐ

Returns an m × n matrix of first-order partial derivatives.

It answers:

"How does each output change with each input?"

The Jacobian is the local linear map of a vector-valued function.

It shows up in:
→ sensitivity analysis
→ change of variables
→ automatic differentiation
→ forward-mode AD
→ reverse-mode AD / backpropagation

In simple terms:

forward-mode AD uses Jacobian-vector products.

reverse-mode AD uses vector-Jacobian products.

𝟯. 𝗛𝗲𝘀𝘀𝗶𝗮𝗻 H_f
Takes a scalar function:

f : ℝⁿ → ℝ

Returns an n × n matrix of second-order partial derivatives.

It answers:

"How does the gradient itself change?"

That means the Hessian measures curvature.

When the second partial derivatives are continuous, the Hessian is symmetric.

At a critical point:
→ positive definite Hessian → strict local minimum
→ negative definite Hessian → strict local maximum
→ indefinite Hessian → saddle point

The clean mental model

Gradient = first derivatives of one output
→ tells you direction

Jacobian = first derivatives of many outputs
→ tells you sensitivity

Hessian = second derivatives of one output
→ tells you curvature

And the relationship between them is simple:
The Hessian is the Jacobian of the gradient.

For a scalar output, the Jacobian contains the same partial derivatives as the gradient, up to row/column convention.

Same idea:
measure change.

Different object:
direction, sensitivity, curvature.

Once this clicks, optimization stops looking like a pile of formulas.

It starts looking like a map of the problem.

30

1K

345

1K

52K

Beast27Mercy retweeted

🅿️

@the_P_God

about 1 month ago

Logged on RuneScape for the first time in a while. Map looks a little different.

107

23K

1K

857

872K

Beast27Mercy retweeted

Tech with Mak

@techNmak

about 1 month ago

These are literally the kind of LLM interview questions most candidates wish they had seen earlier. A curated list of 50 LLM interview questions - shared by Hao Hoang. What's covered: Fundamentals: → Tokenization and why it matters → Attention mechanisms in transformers → Context windows and their tradeoffs → Embeddings and initialization → Positional encodings Fine-tuning & Efficiency: → LoRA vs QLoRA → PEFT to prevent catastrophic forgetting → Model distillation → Adaptive Softmax for large vocabularies Generation & Decoding: → Beam search vs greedy decoding → Temperature, top-k, top-p sampling → Autoregressive vs masked models Advanced Concepts: → RAG (Retrieval-Augmented Generation) → Chain-of-Thought prompting → Mixture of Experts (MoE) → Knowledge graph integration → Zero-shot and few-shot learning Math & Theory: → Softmax in attention → Cross-entropy loss → KL divergence → Gradient computation for embeddings → Vanishing gradient solutions in transformers You don't need to follow me (@techNmak) and comment "LLM". I will put the link in the comments.

techNmak's tweet photo. These are literally the kind of LLM interview questions most candidates wish they had seen earlier.

A curated list of 50 LLM interview questions - shared by Hao Hoang.

What's covered:

Fundamentals:
→ Tokenization and why it matters
→ Attention mechanisms in transformers
→ Context windows and their tradeoffs
→ Embeddings and initialization
→ Positional encodings

Fine-tuning & Efficiency:
→ LoRA vs QLoRA
→ PEFT to prevent catastrophic forgetting
→ Model distillation
→ Adaptive Softmax for large vocabularies

Generation & Decoding:
→ Beam search vs greedy decoding
→ Temperature, top-k, top-p sampling
→ Autoregressive vs masked models

Advanced Concepts:
→ RAG (Retrieval-Augmented Generation)
→ Chain-of-Thought prompting
→ Mixture of Experts (MoE)
→ Knowledge graph integration
→ Zero-shot and few-shot learning

Math & Theory:
→ Softmax in attention
→ Cross-entropy loss
→ KL divergence
→ Gradient computation for embeddings
→ Vanishing gradient solutions in transformers

You don't need to follow me (@techNmak) and comment "LLM". I will put the link in the comments.

18

369

75

580

20K

Who to follow

flower 🧉

@rip_anomaly

I play runescape || rsn: Meiliya

Melis İlayda Bal

@melisilaydabal

PhD student @mpi_is & @EPFL | organizer @tuewiml | prev: @mpi_sws_ , @metu_odtu | interested in optimization & game theory for robust & efficient ML

Aiex

@Its_Not_Aiex

RS3: Not Aiex / Im Aiex OSRS: Its Aiex FFXIV: Not Aiex (Sargatanas, Hyperion, Halicarnassus)

Beast27Mercy retweeted

R𝛼m 🦅

@rambuilds_

about 2 months ago

As an AI Infrastructure Engineer. Please learn: - GPU/VRAM fundamentals, quantization & batching - vLLM / TensorRT-LLM / inference optimization - KV caching, speculative decoding & token throughput - Distributed training basics (DDP/FSDP/DeepSpeed) - Model serving & autoscaling - Vector DB retrieval pipelines - Prompt caching & cost optimization - Observability for LLM apps This is what production AI teams actually care about.

10

648

78

761

18K

Beast27Mercy retweeted

Ahmad

@TheAhmadOsman

about 2 months ago

How to go about learning all of this? 1st: Start with the serving engine view - vLLM: PagedAttention, continuous batching, prefix caching, CUDA graphs - SGLang: RadixAttention/prefix reuse, speculative decoding, MoE, structured/agent workloads - TensorRT-LLM: NVIDIA peak stack, FP8/FP4, Wide-EP, disaggregated serving - FlashInfer: reusable kernel/operator library for attention/GEMM/MoE/sampling 2nd: Go down the stack - Triton tutorials → custom fused kernels - CUTLASS/CuTe → Tensor Core GEMM and Blackwell/Hopper details - FlashAttention papers → attention algorithm/kernel co-design - PagedAttention paper → KV-cache memory management - MoE docs → routing + grouped GEMM + all-to-all - Nsight profiling → stop guessing 3rd: Do this mini-project sequence 1. Implement RMSNorm in Triton; compare to PyTorch 2. Implement fused SiLU × gate 3. Implement simple FP16 matmul; compare to cuBLAS/rocBLAS 4. Implement paged KV lookup for decode attention 5. Add FP8 KV cache with per-block scales 6. Implement toy top-k sampling on GPU 7. Implement tiny MoE dispatch + grouped GEMM 8. Integrate one custom op into vLLM or SGLang and profile end-to-end

27

1K

128

2K

136K

Beast27Mercy retweeted

Rosanna Pansino

@RosannaPansino

2 months ago

Had so much fun on @qtcinderella’s ‘Master Baker’ as a guest judge for the finale! 🍪🍰🥧🧁🍩🎂 @emiru @TheJoshElkin @Valkyrae @Patrickzeinali @AustinOnTwitter @YourRAGEz @supertf @extraemilyy

RosannaPansino's tweet photo. Had so much fun on @qtcinderella’s ‘Master Baker’ as a guest judge for the finale! 🍪🍰🥧🧁🍩🎂

@emiru
@TheJoshElkin
@Valkyrae
@Patrickzeinali
@AustinOnTwitter
@YourRAGEz
@supertf
@extraemilyy https://t.co/E9h3pjseIT

47

7K

142

161

237K

Beast27Mercy retweeted

Tech with Mak

@techNmak

2 months ago

These are literally the kind of LLM interview questions most candidates wish they had seen earlier. A curated list of 50 LLM interview questions - shared by Hao Hoang. What's covered: Fundamentals: → Tokenization and why it matters → Attention mechanisms in transformers → Context windows and their tradeoffs → Embeddings and initialization → Positional encodings Fine-tuning & Efficiency: → LoRA vs QLoRA → PEFT to prevent catastrophic forgetting → Model distillation → Adaptive Softmax for large vocabularies Generation & Decoding: → Beam search vs greedy decoding → Temperature, top-k, top-p sampling → Autoregressive vs masked models Advanced Concepts: → RAG (Retrieval-Augmented Generation) → Chain-of-Thought prompting → Mixture of Experts (MoE) → Knowledge graph integration → Zero-shot and few-shot learning Math & Theory: → Softmax in attention → Cross-entropy loss → KL divergence → Gradient computation for embeddings → Vanishing gradient solutions in transformers You don't need to follow me (@techNmak) and comment "LLM". I will put the link in the comments.

30

950

155

2K

62K

Beast27Mercy retweeted

Turing Post

@TheTuringPost

2 months ago

13+ Attention mechanisms you should know ▪️ Self-attention ▪️ Cross-attention ▪️ Causal attention ▪️ Linear Attention ▪️ Softmax attention ▪️ Sliding Window (local attention) ▪️ Global attention ▪️ FlashAttention ▪️ Multi-Head Attention (MHA) ▪️ Multi-Query Attention (MQA) ▪️ Grouped-Query Attention (GQA) ▪️ Multi-Head Latent Attention (MLA) ▪️ Interleaved Head Attention (IHA) + Slim Attention, KArAt, XAttention, Mixture-of-Depths Attention (MoDA) Save the list and explore more about them here: https://t.co/yQVLSnQB61

TheTuringPost's tweet photo. 13+ Attention mechanisms you should know

▪️ Self-attention
▪️ Cross-attention
▪️ Causal attention
▪️ Linear Attention
▪️ Softmax attention
▪️ Sliding Window (local attention)
▪️ Global attention
▪️ FlashAttention
▪️ Multi-Head Attention (MHA)
▪️ Multi-Query Attention (MQA)
▪️ Grouped-Query Attention (GQA)
▪️ Multi-Head Latent Attention (MLA)
▪️ Interleaved Head Attention (IHA)

+ Slim Attention, KArAt, XAttention, Mixture-of-Depths Attention (MoDA)

Save the list and explore more about them here: https://t.co/yQVLSnQB61

12

2K

293

2K

140K

Beast27Mercy retweeted

Evan Luthra

@EvanLuthra

2 months ago

Anthropic pays engineers $750,000+ a year to understand how LLMs work. Stanford just put a 2 hour lecture that covers 80% of it for FREE. Bookmark this. Give it 2 hours today. It might be the highest ROI thing you do this month:

230

22K

3K

52K

3M

Beast27Mercy retweeted

Vyom 👾

@HelloVyom

2 months ago

bookmark this!!! The AI interview meta changed. companies like Anthropic & OpenAI are now asking you to implement attention mechanisms from scratch in live rounds. free repos that actually cover this 👇

HelloVyom's tweet photo. bookmark this!!!

The AI interview meta changed. companies like Anthropic & OpenAI are now asking you to implement attention mechanisms from scratch in live rounds.

free repos that actually cover this 👇 https://t.co/ZLoO4poaoH

12

2K

162

3K

77K

Beast27Mercy retweeted

Vivo

@vivoplt

2 months ago

claude code is fucking insane i know literally NOTHING about coding. ZERO. and i just built a fully functioning web app in minutes. http://localhost:3000/ check it out

408

12K

519

602

821K

Beast27Mercy retweeted

Suni

@suni_code

2 months ago

Someone on GitHub uploaded LeetCode Patterns 😭 Now I don’t need to GRIND 600 Random Questions anymore 🙂

35

2K

207

3K

115K

Beast27Mercy retweeted

ludwig

@LudwigAhgren

3 months ago

16 days and 4,000 kilometers later we crossed China. We ate like people in China, smoked like people in China, and spoke like absolute idiots never before seen in China. This trip was harder than last years. Getting sick, navigating the magical 207, and failing to say the word hotel every night all the while referring to myself as “p*ssy burger” took its toll I’m lucky to be able to do this cat line with Michael. It’s not easy to find someone you can spend everyday and every night without some tensions brewing- but I could hangout with that dude everyday until I die and never get sick of him. A big thank you to my crew who are the only reason anyone was able to follow our journey and turn the hours of chaos into an actual series. Editors, translators, producers, fixers, assistants, designers, a dude named Li, and many more people are the reason tip2tip exists. A special thanks to Cam, Dan, Alicia, Dave, Lucky, and Zhang Wei who followed us in an RV for 16 days <3 Finally, shoutout everyone in China the people there were unforgettably nice to two white colored foreigners riding motorcycles across China 🇨🇳😄✊

LudwigAhgren's tweet photo. 16 days and 4,000 kilometers later we crossed China. We ate like people in China, smoked like people in China, and spoke like absolute idiots never before seen in China.
This trip was harder than last years. Getting sick, navigating the magical 207, and failing to say the word hotel every night all the while referring to myself as “p*ssy burger” took its toll

I’m lucky to be able to do this cat line with Michael. It’s not easy to find someone you can spend everyday and every night without some tensions brewing- but I could hangout with that dude everyday until I die and never get sick of him.
A big thank you to my crew who are the only reason anyone was able to follow our journey and turn the hours of chaos into an actual series. Editors, translators, producers, fixers, assistants, designers, a dude named Li, and many more people are the reason tip2tip exists.
A special thanks to Cam, Dan, Alicia, Dave, Lucky, and Zhang Wei who followed us in an RV for 16 days <3
Finally, shoutout everyone in China the people there were unforgettably nice to two white colored foreigners riding motorcycles across China 🇨🇳😄✊

378

40K

2K

1K

2M

Beast27Mercy retweeted

Tech with Mak

@techNmak

3 months ago

Someone removed the vector database from RAG and got better results. Much better. Here's what traditional RAG actually does under the hood: it chunks your document into pieces, embeds those pieces into vectors, and retrieves based on semantic similarity. The assumption is that similar text = relevant text. That assumption breaks completely for professional documents. When you ask "what were the debt trends in Q3?", vector search returns chunks that look similar to that question. But the actual answer might be buried in an appendix, referenced across three sections, in a part of the document that shares zero semantic overlap with your query. Traditional RAG never finds it. Similarity ≠ relevance. PageIndex was built around that insight. Inspired by AlphaGo, it builds a hierarchical tree index from your document - an intelligent table of contents optimized for LLM reasoning. Then it navigates that tree the way a human expert would. Not pattern matching. Reasoning. "Debt trends are usually in the financial summary or Appendix G, let's look there." What disappears: → No vector DB to build or maintain → No arbitrary chunking that breaks cross-section context → No opaque retrieval you can't explain or trace What you get: → Retrieval traceable to exact page and section references → Multi-step reasoning across document structure → Works on financial reports, legal filings, regulatory documents The benchmark: → PageIndex: 98.7% on FinanceBench → Perplexity: 45% → GPT-4o: 31% Open source.

techNmak's tweet photo. Someone removed the vector database from RAG and got better results. Much better.

Here's what traditional RAG actually does under the hood:
it chunks your document into pieces, embeds those pieces into vectors, and retrieves based on semantic similarity. The assumption is that similar text = relevant text.

That assumption breaks completely for professional documents.

When you ask "what were the debt trends in Q3?", vector search returns chunks that look similar to that question. But the actual answer might be buried in an appendix, referenced across three sections, in a part of the document that shares zero semantic overlap with your query. Traditional RAG never finds it.

Similarity ≠ relevance. PageIndex was built around that insight.

Inspired by AlphaGo, it builds a hierarchical tree index from your document - an intelligent table of contents optimized for LLM reasoning. Then it navigates that tree the way a human expert would. Not pattern matching. Reasoning. "Debt trends are usually in the financial summary or Appendix G, let's look there."

What disappears:
→ No vector DB to build or maintain
→ No arbitrary chunking that breaks cross-section context
→ No opaque retrieval you can't explain or trace

What you get:
→ Retrieval traceable to exact page and section references
→ Multi-step reasoning across document structure
→ Works on financial reports, legal filings, regulatory documents

The benchmark:
→ PageIndex: 98.7% on FinanceBench
→ Perplexity: 45%
→ GPT-4o: 31%

Open source.

17

684

101

838

43K

Beast27Mercy retweeted

Cheese @CheeseOSRS

4 months ago

Alysa Liu recently went viral for her Teen Vogue rant on Chambers of Xeric purple rates in Old School Runescape. "If you want players to do challenged mode instead of scaled raids, the prayer scroll distribution rate must be altered."

CheeseOSRS's tweet photo. Alysa Liu recently went viral for her Teen Vogue rant on Chambers of Xeric purple rates in Old School Runescape.

"If you want players to do challenged mode instead of scaled raids, the prayer scroll distribution rate must be altered." https://t.co/oxyFJUv1Tl

25

2K

96

165

117K

Beast27Mercy retweeted

Katyayani Shukla

@aibytekat

4 months ago

Final interview. They ask: "Can you share your payslip?" Your mind races. You say: "Sure, I can send it over later." They smile. You just lost all your leverage. Here’s how to answer without losing the negotiation (and actually increase your offer):

61

2K

236

4K

1M

Beast27Mercy retweeted

Tech with Mak

@techNmak

4 months ago

These are literally the kind of LLM interview questions most candidates wish they had seen earlier. A curated list of 50 LLM interview questions - shared by Hao Hoang. What's covered: Fundamentals: → Tokenization and why it matters → Attention mechanisms in transformers → Context windows and their tradeoffs → Embeddings and initialization → Positional encodings Fine-tuning & Efficiency: → LoRA vs QLoRA → PEFT to prevent catastrophic forgetting → Model distillation → Adaptive Softmax for large vocabularies Generation & Decoding: → Beam search vs greedy decoding → Temperature, top-k, top-p sampling → Autoregressive vs masked models Advanced Concepts: → RAG (Retrieval-Augmented Generation) → Chain-of-Thought prompting → Mixture of Experts (MoE) → Knowledge graph integration → Zero-shot and few-shot learning Math & Theory: → Softmax in attention → Cross-entropy loss → KL divergence → Gradient computation for embeddings → Vanishing gradient solutions in transformers You don't need to follow me (@techNmak) and comment "LLM". I will put the link in the comments.

40

1K

183

2K

83K

Beast27Mercy retweeted

Goodfire

@GoodfireAI

8 months ago

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

GoodfireAI's tweet photo. LLMs memorize a lot of training data, but memorization is poorly understood.

Where does it live inside models? How is it stored? How much is it involved in different tasks?

@jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7) https://t.co/w0UWnoBOsX

11

808

136

749

193K

Beast27Mercy retweeted

Jay Alammar

@JayAlammar

8 months ago

The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations that make the field easier to grasp. Link in thread!

25

1K

215

990

184K

Rohan

@Beast27Mercy

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users