Natoshi Sakamoto

@CallMeAshay

Old School. Scuderia Ferrari. Manchester City. Apple

Joined April 2020

352 Following

46 Followers

4.1K Posts

CallMeAshay retweeted

CJ Zafir

@cjzafir

14 days ago

My Fine-tuning Stack for Small Language Models (2B to 15B Models) It costs me around $150 to generate a fresh dataset (~150M) and fine-tune the model. > Codex 5.5= orchestrator / operator > Deekseek v4 pro /Kimi 2.6= data gen. engine (dirt cheap) > Qwen 3.5 = best model to fine-tune (4B, 9B, 27B) > Unsloth = faster, cheaper fine-tuning framework. > Colab = Cheapest cloud GPU (A100 80GB for $0.66/hr) > G Drive = to save datasets (good codex + colab integration) > Huggingface = To host datasets + Models So Codex as planner & auditor, Deepseek as cheapest executor, Unsloth to fine-tune fast, Colab to get cheapest A100 GPU, Huggingface to host the fine-tuned model. Anyone can fine-tune, and run a Sonnet 4.5 level Custom model on their system.

cjzafir's tweet photo. My Fine-tuning Stack for Small Language Models (2B to 15B Models)

It costs me around $150 to generate a fresh dataset (~150M) and fine-tune the model.

> Codex 5.5= orchestrator / operator
> Deekseek v4 pro /Kimi 2.6= data gen. engine (dirt cheap)
> Qwen 3.5 = best model to fine-tune (4B, 9B, 27B)
> Unsloth = faster, cheaper fine-tuning framework.
> Colab = Cheapest cloud GPU (A100 80GB for $0.66/hr)
> G Drive = to save datasets (good codex + colab integration)
> Huggingface = To host datasets + Models

So Codex as planner & auditor,
Deepseek as cheapest executor,
Unsloth to fine-tune fast,
Colab to get cheapest A100 GPU,
Huggingface to host the fine-tuned model.

Anyone can fine-tune, and run a Sonnet 4.5 level Custom model on their system.

885

37K

CallMeAshay retweeted

Brij Pandey

@LearnWithBrij

16 days ago

AI INFRASTRUCTURE — MASTER TREE 🌲 AI Infrastructure │ ├── 01. Compute Layer │ ├── GPUs │ │ ├── H100 │ │ ├── B200 │ │ ├── MI300X │ │ └── TPU v5 │ │ │ ├── Inference Engines │ │ ├── vLLM │ │ ├── TensorRT-LLM │ │ ├── Ollama │ │ └── llama.cpp │ │ │ └── Optimization │ ├── Quantization │ ├── KV Cache │ ├── Speculative Decoding │ └── Flash Attention │ ├── 02. Model Layer │ ├── Frontier Models │ │ ├── GPT-4.1 │ │ ├── Claude │ │ ├── Gemini │ │ └── DeepSeek │ │ │ ├── Open Models │ │ ├── Llama │ │ ├── Qwen │ │ ├── Mistral │ │ └── Gemma │ │ │ └── Fine-Tuning │ ├── LoRA │ ├── RLHF │ ├── DPO │ └── Synthetic Data │ ├── 03. Data Layer │ ├── Data Pipelines │ ├── Chunking │ ├── Embeddings │ ├── Vector Databases │ ├── Knowledge Graphs │ └── Real-Time Streams │ ├── 04. Agent Runtime │ ├── LangGraph │ ├── CrewAI │ ├── OpenAI Agents SDK │ ├── AutoGen │ ├── MCP │ └── Workflow Engines │ ├── 05. Tooling Layer │ ├── Web Search │ ├── Browser Use │ ├── Code Execution │ ├── APIs │ ├── Databases │ └── File Systems │ ├── 06. Deployment Layer │ ├── Docker │ ├── Kubernetes │ ├── Serverless GPUs │ ├── Edge Inference │ ├── Cloudflare Workers │ └── HuggingFace Spaces │ ├── 07. Observability │ ├── Logs │ ├── Traces │ ├── Evaluations │ ├── Hallucination Detection │ ├── Latency Tracking │ └── Cost Monitoring │ ├── 08. Security Layer │ ├── Sandboxing │ ├── Permission Systems │ ├── Secret Management �� ├── Guardrails │ ├── Human Approval │ └── Jailbreak Protection │ └── 09. The Future ├── AI Browsers ├── AI Operating Systems ├── Autonomous Research Labs ├── AI Employees └── One-Person Unicorns Most people think AI is just a model. The real moat is the infrastructure stack around it.

LearnWithBrij's tweet photo. AI INFRASTRUCTURE — MASTER TREE 🌲

AI Infrastructure
│
├── 01. Compute Layer
│ ├── GPUs
│ │ ├── H100
│ │ ├── B200
│ │ ├── MI300X
│ │ └── TPU v5
│ │
│ ├── Inference Engines
│ │ ├── vLLM
│ │ ├── TensorRT-LLM
│ │ ├── Ollama
│ │ └── llama.cpp
│ │
│ └── Optimization
│ ├── Quantization
│ ├── KV Cache
│ ├── Speculative Decoding
│ └── Flash Attention
│
├── 02. Model Layer
│ ├── Frontier Models
│ │ ├── GPT-4.1
│ │ ├── Claude
│ │ ├── Gemini
│ │ └── DeepSeek
│ │
│ ├── Open Models
│ │ ├── Llama
│ │ ├── Qwen
│ │ ├── Mistral
│ │ └── Gemma
│ │
│ └── Fine-Tuning
│ ├── LoRA
│ ├── RLHF
│ ├── DPO
│ └── Synthetic Data
│
├── 03. Data Layer
│ ├── Data Pipelines
│ ├── Chunking
│ ├── Embeddings
│ ├── Vector Databases
│ ├── Knowledge Graphs
│ └── Real-Time Streams
│
├── 04. Agent Runtime
│ ├── LangGraph
│ ├── CrewAI
│ ├── OpenAI Agents SDK
│ ├── AutoGen
│ ├── MCP
│ └── Workflow Engines
│
├── 05. Tooling Layer
│ ├── Web Search
│ ├── Browser Use
│ ├── Code Execution
│ ├── APIs
│ ├── Databases
│ └── File Systems
│
├── 06. Deployment Layer
│ ├── Docker
│ ├── Kubernetes
│ ├── Serverless GPUs
│ ├── Edge Inference
│ ├── Cloudflare Workers
│ └── HuggingFace Spaces
│
├── 07. Observability
│ ├── Logs
│ ├── Traces
│ ├── Evaluations
│ ├── Hallucination Detection
│ ├── Latency Tracking
│ └── Cost Monitoring
│
├── 08. Security Layer
│ ├── Sandboxing
│ ├── Permission Systems
│ ├── Secret Management
�� ├── Guardrails
│ ├── Human Approval
│ └── Jailbreak Protection
│
└── 09. The Future
├── AI Browsers
├── AI Operating Systems
├── Autonomous Research Labs
├── AI Employees
└── One-Person Unicorns

Most people think AI is just a model.

The real moat is the infrastructure stack around it.

345

369

13K

CallMeAshay retweeted

Lotto

@LottoLabs

15 days ago

It’s very simple Find a 3090 or two Get any mobo that supports 2 pcie x16 ports (at least x16x4 for lanes) Get a 1200W+ PSU Buy the cheapest ddr4 ram 64gb+ (you’re not using it anyways) Install Linux, vLLM, Llama.cpp, SGlang, tailscale Download any flavour of qwen 3.7 27b You are now localmaxxing

954

73K

CallMeAshay retweeted

CyrilXBT

@cyrilXBT

17 days ago

ANDREJ KARPATHY WROTE 65 LINES IN A CLAUDE.MD FILE AND IT JUST HIT NUMBER 1 ON GITHUB TRENDING. Coding accuracy jumped from 65% to 94%. Not a new model. Not a better subscription. 65 lines of plain text. Here is what that number actually means. 65% accuracy means one in three things Claude Code builds has a problem. 94% accuracy means almost everything it builds works the first time. That gap is the difference between Claude Code feeling like a powerful tool and Claude Code feeling like a senior engineer who knows your codebase. And Karpathy closed that gap with a text file. Here is why this works. Claude Code starts every session with zero context about your project, your standards, or how you want it to operate. Without a CLAUDE.md it makes assumptions. Reasonable assumptions compound into unreasonable outcomes across a complex build. With Karpathy's 65 lines it has rules. Think before you code. Make surgical changes. Simplicity first. Never assume. Verify. When uncertain ask. These are not complex instructions. They are the operating principles of every great engineer compressed into plain text that Claude reads before it touches your codebase. 65 lines. Number 1 on GitHub. 29% accuracy improvement. The entire Claude Code community has been trying to figure out why some setups feel transformative and others feel mediocre. Karpathy just answered the question in 65 lines and published it for free. Bookmark this before you open Claude Code today. Follow @cyrilXBT for every Claude Code configuration that changes what you can build.

149

163

16K

Who to follow

Arizona Report

@ArizonaReport

Many of our Founding Fathers were only in their mid-30s when they carved out the God-breathed documents that catapulted our nation forward 248 years ago.

America Only 🇺🇲

@HeartlandSam

Help your country locate and arrest illegal aliens. To report criminal activity, call 866-DHS-2-ICE (866-347-2423) NO AMNESTY!! America Only 🇺🇲

Mac Candee

@worldnomac

Changing perceptions of the world 🌍 one country at a time

CallMeAshay retweeted

Akshay 🚀

@akshay_pachaar

17 days ago

from prompt to context to harness engineering. three terms keep coming up in AI engineering, and they get conflated all the time. here is the cleanest way to understand what each one is and how they fit together. 𝗽𝗿𝗼𝗺𝗽𝘁 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗶𝘀 𝘁𝗵𝗲 𝗺𝗲𝘀𝘀𝗮𝗴𝗲. the model has no memory of anything before this single call, so the prompt has to carry the full universe of what it needs to know. that means a role, some background, the instructions, a few examples, and a format. these get assembled into one input and sent to the model. when the output falls short, the skill is figuring out which ingredient is actually letting you down, not rewriting the instructions every time. the unit of work is one input. 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗶𝘀 𝘁𝗵𝗲 𝗺𝗲𝗺𝗼𝗿𝘆. across multiple steps, the window is finite and the information available is not, which forces a curation step. without it, important details get buried under stale tool outputs and old turns, and the model's attention degrades on the things that actually matter. a curator selects what stays, compresses what is useful but bulky, and drops the rest. each step's output then feeds into the next step, where good curation is more about knowing what to throw away than packing more in. the unit of work is what stays in the window, step by step. 𝗵𝗮𝗿𝗻𝗲𝘀𝘀 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗶𝘀 𝘁𝗵𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲. on its own, a model just generates text. the harness is what turns it into something that can take actions, check its own work, and recover when a step goes wrong. the full loop has three phases: - 𝗴𝗮𝘁𝗵𝗲𝗿 pulls together everything the model needs - 𝗮𝗰𝘁 runs the model and calls tools or sub-agents - and 𝘃𝗲𝗿𝗶𝗳𝘆 checks the output with tests or a judge on failure, the whole loop retries with updated context, which is the entire difference between calling an API and running an agent. the unit of work is the machine itself. here is the part that ties it together. prompt engineering and context engineering both live inside 𝗴𝗮𝘁𝗵𝗲𝗿. the harness is the outer container, context is what it curates, and the prompt is what it finally hands to the model. zoom out and the unit of work gets bigger. zoom in and you are back at the prompt. i also published this deep dive (article) on agent harness engineering, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent. the article is quoted below.

akshay_pachaar's tweet photo. from prompt to context to harness engineering.

three terms keep coming up in AI engineering, and they get conflated all the time. here is the cleanest way to understand what each one is and how they fit together.

𝗽𝗿𝗼𝗺𝗽𝘁 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗶𝘀 𝘁𝗵𝗲 𝗺𝗲𝘀𝘀𝗮𝗴𝗲.

the model has no memory of anything before this single call, so the prompt has to carry the full universe of what it needs to know. that means a role, some background, the instructions, a few examples, and a format.

these get assembled into one input and sent to the model. when the output falls short, the skill is figuring out which ingredient is actually letting you down, not rewriting the instructions every time.

the unit of work is one input.

𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗶𝘀 𝘁𝗵𝗲 𝗺𝗲𝗺𝗼𝗿𝘆.

across multiple steps, the window is finite and the information available is not, which forces a curation step. without it, important details get buried under stale tool outputs and old turns, and the model's attention degrades on the things that actually matter.

a curator selects what stays, compresses what is useful but bulky, and drops the rest. each step's output then feeds into the next step, where good curation is more about knowing what to throw away than packing more in.

the unit of work is what stays in the window, step by step.

𝗵𝗮𝗿𝗻𝗲𝘀𝘀 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗶𝘀 𝘁𝗵𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲.

on its own, a model just generates text. the harness is what turns it into something that can take actions, check its own work, and recover when a step goes wrong.

the full loop has three phases:

- 𝗴𝗮𝘁𝗵𝗲𝗿 pulls together everything the model needs
- 𝗮𝗰𝘁 runs the model and calls tools or sub-agents
- and 𝘃𝗲𝗿𝗶𝗳𝘆 checks the output with tests or a judge

on failure, the whole loop retries with updated context, which is the entire difference between calling an API and running an agent.

the unit of work is the machine itself.

here is the part that ties it together.

prompt engineering and context engineering both live inside 𝗴𝗮𝘁𝗵𝗲𝗿. the harness is the outer container, context is what it curates, and the prompt is what it finally hands to the model.

zoom out and the unit of work gets bigger. zoom in and you are back at the prompt.

i also published this deep dive (article) on agent harness engineering, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent.

the article is quoted below.

847

169

82K

CallMeAshay retweeted

CJ Zafir

@cjzafir

16 days ago

Do something different this weekend. Become a PRO in AI Model Fine-tuning. Paste this prompt in Codex/ChatGPT/Claude/Grok. "You are an expert AI engineer and teacher. Your job is to teach me modern LLM engineering and fine-tuning concepts from beginner to advanced level using very simple daily-life language. Teach me step-by-step like a real mentor. Assume I am smart but new to the topic. Foundations: - LLM basics - How AI models work - Tokens - Tokenization - Context windows - Embeddings - Transformers - Attention mechanism - Parameters - Training vs inference - Open-source vs closed-source models Datasets & Training: - SFT datasets - Instruction tuning - Preference datasets - Synthetic datasets - Data curation - Dataset cleaning - Dataset formatting - Fine-tuning basics - Continued pretraining - Hallucination reduction Fine-Tuning: - LoRA - QLoRA - DPO - RLHF - Quantization - Model checkpoints - Adapter tuning - GGUF models Inference & Optimization: - KV cache - Flash Attention - Speculative decoding - Inference optimization - Model serving - Batch inference - GPU basics - VRAM basics - Latency vs quality tradeoffs Local AI Ecosystem: - llama.cpp - Ollama - vLLM - MLX - Hugging Face - Unsloth - Axolotl - PEFT - TRL library RAG & Memory: - RAG - Vector databases - Chunking - Retrieval pipelines - AI memory systems - Semantic search Agents & Workflows: - Prompt engineering - System prompts - Tool calling - Function calling - AI agents - Agentic workflows - Multi-agent systems - Browser agents Model Types: - VLMs - SLMs - Dense models - MoE models - Coding models - Reasoning models Deployment: - Local inference - On-device AI - API serving - Cloud GPUs - Edge AI basics Evaluation: - AI benchmarks - Human evals - Cost-per-token analysis - Speed benchmarking - Quality benchmarking Real-World Skills: - Building chatbots - Building AI copilots - AI automation - AI SaaS workflows - AI coding workflows - AI orchestration systems - AI product thinking Start from the absolute basics and gradually make me advanced. Rules: - Use simple English only - Avoid academic jargon unless necessary - Explain every difficult word in plain language - Use real-world analogies and daily-life examples - Use small code snippets when useful - Show practical use cases - Compare concepts side-by-side when helpful - Teach from fundamentals first, then advanced concepts - At the end of each topic: - give a short summary - give a simple mental model - give beginner mistakes to avoid - give a small exercise/project I want deep understanding, not memorization." Thank me later.

cjzafir's tweet photo. Do something different this weekend.

Become a PRO in AI Model Fine-tuning.

Paste this prompt in Codex/ChatGPT/Claude/Grok.

"You are an expert AI engineer and teacher.

Your job is to teach me modern LLM engineering and fine-tuning concepts from beginner to advanced level using very simple daily-life language.

Teach me step-by-step like a real mentor. Assume I am smart but new to the topic.

Foundations:

- LLM basics
- How AI models work
- Tokens
- Tokenization
- Context windows
- Embeddings
- Transformers
- Attention mechanism
- Parameters
- Training vs inference
- Open-source vs closed-source models

Datasets & Training:

- SFT datasets
- Instruction tuning
- Preference datasets
- Synthetic datasets
- Data curation
- Dataset cleaning
- Dataset formatting
- Fine-tuning basics
- Continued pretraining
- Hallucination reduction

Fine-Tuning:

- LoRA
- QLoRA
- DPO
- RLHF
- Quantization
- Model checkpoints
- Adapter tuning
- GGUF models

Inference & Optimization:

- KV cache
- Flash Attention
- Speculative decoding
- Inference optimization
- Model serving
- Batch inference
- GPU basics
- VRAM basics
- Latency vs quality tradeoffs

Local AI Ecosystem:

- llama.cpp
- Ollama
- vLLM
- MLX
- Hugging Face
- Unsloth
- Axolotl
- PEFT
- TRL library

RAG & Memory:

- RAG
- Vector databases
- Chunking
- Retrieval pipelines
- AI memory systems
- Semantic search

Agents & Workflows:

- Prompt engineering
- System prompts
- Tool calling
- Function calling
- AI agents
- Agentic workflows
- Multi-agent systems
- Browser agents

Model Types:

- VLMs
- SLMs
- Dense models
- MoE models
- Coding models
- Reasoning models

Deployment:

- Local inference
- On-device AI
- API serving
- Cloud GPUs
- Edge AI basics

Evaluation:

- AI benchmarks
- Human evals
- Cost-per-token analysis
- Speed benchmarking
- Quality benchmarking

Real-World Skills:

- Building chatbots
- Building AI copilots
- AI automation
- AI SaaS workflows
- AI coding workflows
- AI orchestration systems
- AI product thinking

Start from the absolute basics and gradually make me advanced.

Rules:

- Use simple English only
- Avoid academic jargon unless necessary
- Explain every difficult word in plain language
- Use real-world analogies and daily-life examples
- Use small code snippets when useful
- Show practical use cases
- Compare concepts side-by-side when helpful
- Teach from fundamentals first, then advanced concepts
- At the end of each topic:
- give a short summary
- give a simple mental model
- give beginner mistakes to avoid
- give a small exercise/project

I want deep understanding, not memorization."

Thank me later.

374

102K

CallMeAshay retweeted

FOX Soccer

@FOXSoccer

18 days ago

The crowd chanting "SIUUUU!" as Ronaldo bangs the drum 🔥

353

53K

CallMeAshay retweeted

Akshay Shinde

@ConsciousRide

26 days ago

90% of LangGraph & Agent Framework interviews in 2026 are just these 10 concepts repeated:

984

95K

Natoshi Sakamoto @CallMeAshay

27 days ago

@KarmSumal Is it comparable to Toronto now or still no at this point?

CallMeAshay retweeted

bodila

@51bodila

about 1 month ago

Jane Street didn’t hire vibe-coder at $385k/year because he didn’t use Claude Code 37-minutes of vibe coding RIGHT during an interview at a Tier 1 fund Bookmark & watch - you’ll finally understand why you need to use Claude Code. Then, read the article below.

180

623K

CallMeAshay retweeted

Rahul

@sairahul1

about 1 month ago

Two Anthropic engineers spent 24 minutes exposing every Claude Code feature you didn't know existed. Most people will scroll past this. Don't be most people.

142

36K

93K

10M

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@10sCpa @pavyg They dont give a fuck about the war. They just want to be portrayed as hereos in their own country by doing these kind of things.

132

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@Ritamery791 @MarioNawfal It's faster to deliver

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@another_fi53412 @JennyB311 No part of breaking any law is going to help you not pay for those illegal aliens joker. No wonder you're just another low iq-ed hate monger here. Imagine my shock

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@another_fi53412 @JennyB311 What part of entering a private property and refusing to obey security instructions is legal. Tit for tat? Is that how you guys play around there now?

CallMeAshay retweeted

ChatGPT

@ChatGPTapp

about 1 month ago

at long last

851

31K

756

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@TheUltimateNoor @_sabasaurus If you look at one of the other cofounders, he gave an interview at Google for an intern role at age 18, and some google VP at that time decided right away he will fund whatever this guy builds. Many people are called in for interviews by google, Only some make such impact

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@TheUltimateNoor @_sabasaurus PhDs are the only streams out of all study streams where you get the funding to literally try things out. The people you're surrounded with plays the most important role in these situations, but you need to do something to be in those environments that you can capitalize on.

Natoshi Sakamoto @CallMeAshay

about 1 month ago

@holycatsetra @RoundtableSpace There is nothing from them to patch here. Its totally unrelared to their service. Someone I think has bought the domain playimdb or sth that takes in arguments just like imdb does and plays movie from another (possibly illegal) website. IMDB would for sure report that

166

Natoshi Sakamoto

@CallMeAshay

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users