Announcing 100% free MiniMax M3 with a blazing fast provider.
- Up to 200 tokens/sec 🔥
- It's so good we had to crown it "RECOMMENDED"!
Try now, it's free:
npm i -g freebuff
56,000+ tokens/sec at just 80 MHz. 🤯
I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU)
Just pure digital silicon running @karpathy microGPT, spelling out names on a tiny LCD.
This is GateGPT 👇
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️
MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss.
Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.
GGUFs + Guide: https://t.co/c4gAUlb6YE
Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier
MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy.
The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese.
See more details below ⬇️
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
Training an LLM from scratch is easier to study when the whole path is in one repo.
Train LLM From Scratch is a PyTorch repository for learning how a transformer language model is built, trained, saved, and used for text generation.
It helps you move from “I understand attention on paper” to a runnable training pipeline by pairing model code with data download, preprocessing, config, training, and generation scripts.
Key features:
• Transformer components from scratch – separate PyTorch modules for MLP, attention, transformer blocks, and the final model
• Pile-based data path – scripts download The Pile files and preprocess JSONL.ZST text into tokenized HDF5 datasets
• Configurable training setup – model size, context length, heads, blocks, batch size, learning rate, and file paths live in https://t.co/zuPqaR3MhP
• Hardware guidance – README compares common GPUs for 13M and 2B-class training runs
• Generation workflow included – generate_text.py loads trained checkpoints and produces sample text outputs
It’s open-source (MIT license).
Link in the reply 👇
HUGGING FACE DROPPED A FREE CONTEXT ENGINEERING COURSE
and the curriculum is stacked:
▫️ unit 1: agent skills + SKILL.md format
▫️ unit 2: MCP (model context protocol)
▫️ unit 3: plugins for tool distribution
▫️ unit 4: subagents + multi-agent workflows
▫️ unit 5: hooks to guard the agent lifecycle
▫️ bonus: build your own agent from scratch
https://t.co/1HjjaXVOek
Claude Code cannot read 300 files at once.
So someone built a system that lets it control NotebookLM from the terminal instead. The results are wild.
Here is the full workflow nobody is talking about:
The Setup
→ Claude Code connects to NotebookLM via a command line interface
→ Claude searches YouTube, finds relevant videos, uploads them as sources automatically
→ NotebookLM processes up to 300 sources simultaneously and returns cited, grounded answers
→ Everything syncs back into your Obsidian vault with passage-level citations you can click to verify
Why This Changes Research Forever
→ No more 20 browser tabs you never close
→ No more copy-pasting outputs into random notes
→ No more hallucinated answers with no sources to back them up
→ 60% of citations verified as strong matches in accuracy audits - answers are grounded in real data
What Claude Can Do From the Terminal
→ Search YouTube for relevant videos on any topic and rank by relevance
→ Create a new NotebookLM notebook and add 20 sources in parallel automatically
→ Ask questions and export cited answers directly into Obsidian with wikilinks
→ Set custom personas per notebook - concise, no filler, no preamble
→ Generate audio overviews and save them as MP3 files into your vault
→ Build mind maps, flashcard decks, and research dashboards from your sources
→ Search arXiv for academic papers and feed them directly into NotebookLM
→ Upload competitor blog posts, podcast episodes, PDFs, and your own vault notes
The Obsidian Output
→ Every answer arrives with clickable citations that link to the exact passage in the source video or article
→ Graph view shows connections between all 20 sources and the topics they share
→ Q&A log tracks every question asked and the grounded response received
→ Source dashboard shows citation frequency, topics extracted, and which questions each source answered
Use Cases Worth Building Today
→ Academic research with arXiv papers, full citation traceability
→ Competitor analysis from their YouTube channels and blog posts
→ Company knowledge base for onboarding, new employees ask NotebookLM instead of interrupting teammates
→ Podcast research, feed 4-hour Lex Fridman episodes and ask what's new in AI this week
→ Personal second brain, 300 daily notes uploaded and queryable in one notebook
Before this system existed you needed 20 tabs, hours of manual reading, and no guarantee the answers were real.
Now you type one prompt in the terminal and Claude does all of it for you.
The research stack of 2026 is not a browser. It is a terminal connected to everything
I just finished creating a guide that connects NotebookLM + Antigravity
Spent 67 hours creating this system that turns your knowledge base into an AI agent that actually takes action
BONUS: Complete guide for building 10 workflows + copy-paste prompts
Like & Comment "FREE" and I'll DM it to you as fast as i can
NO WAY.
they just trained a 1B AI model with a framework written entirely by AI agents 🤯
MiniCPM just dropped MiniCPM-5 1B — a browser-runnable model that punches way above its size.
→ runs locally on CPU
→ works inside your browser
→ beats Qwen3.5-0.8B + LFM2.5-1.2B on math, code, and reasoning
→ trained with an AI-generated framework, almost zero human-written infrastructure
→ reportedly 10% faster than NVIDIA Megatron
the craziest part?
we’re entering the era where AI is starting to build the systems that train the next generation of AI.
not “AI assistants.”
actual recursive AI engineering.
Repo 👇
How to boost LLM accuracy by 2-10x without training?
OptiLLM is an OpenAI-compatible inference proxy that boosts LLM accuracy by 2-10x on math, coding, and reasoning tasks.
No training. No fine-tuning.
Just proxy your API calls through it.
A step-by-step guide 🧵👇
The Science of Learning Math (and Anything Else)
[2:10] My background: growing up in a non-technical family and finding math on my own.
[5:45] Self-studying 3,000 hours of college math in high school: starting with calculus the summer after 10th grade and continuing through undergraduate-level math for the rest of high school.
[16:10] Whether the same ground could have been covered more efficiently -- and how being responsible for other people's learning eventually crystallized the underlying principles.
[29:55] How having math foundations in place paid off in research: getting into Fermilab and CERN research projects at university labs.
[43:10] What the Math Academy learning system looks like: adaptive diagnostic, custom knowledge graph, minimum effective doses of instruction followed immediately by problem-solving, mastery before advancing.
[47:34] How we built the knowledge graph: years of manual work by domain experts, refined with analytics for nearly a decade.
[1:10:46] How the FIRE (Fractional Implicit REpetition) algorithm works: solving a harder problem implicitly reviews the sub-skills it encompasses, compressing the review pile significantly.
[1:35:50] Math and sport. Cognitive science principles -- mastery before advancing, spaced practice, interleaving -- are often easier to see in sport than in math.
[1:42:00] Does doing math well require different skills than teaching it well?
[1:56:25] Automaticity as a prerequisite for deeper understanding.
[2:05:35] The anatomy of "aha" moments.
[2:14:11] Learning math as an adult: the amount of work doesn't change, only your free time does. Math Academy's Mathematical Foundations sequence covers the prerequisite stack for university math in roughly 15,000 minutes.
[2:24:10] Balancing fundamentals and exploration: exploration pays off most at the frontier of a subject.
[2:33:55] Is it ever too late?
[2:46:00] Bottom-up versus top-down learning.
[2:56:30] Students with ADHD often feel the effects of inefficient pedagogy more strongly. Interleaving minimum effective doses of guided instruction and active problem-solving is better for everyone.
[3:06:20] AI tools as a multiplier on existing ability: the more you know, the more useful they are; the less you know, the harder it is to detect when they've gone wrong.
[3:14:37] What I'm most focused on right now: taking Math Academy from workshop to factory -- producing courses at scale without sacrificing quality.
Instead of watching Netflix tonight, watch this Stanford lecture.
It explains how ChatGPT and Claude are actually built.
Stanford released it free.
Bookmark this.
Terence Tao says the math behind today’s LLMs is actually simple. Training and running them mostly uses linear algebra, matrix multiplication, and a bit of calculus, material an undergraduate can handle. We understand how to build and operate these models.
The real mystery is why they work so well on some tasks and fail on others, and why we cannot predict that in advance. We lack good rules for forecasting performance across tasks, so progress is largely empirical.
A key reason is the nature of real-world data. Pure noise is well understood, perfectly structured data is well understood, but natural text sits in between, partly structured and partly random. Mathematics for that middle regime is thin, similar to how physics struggles at meso-scales between atoms and continua.
Because of this gap, we can describe the mechanisms but cannot yet explain capability jumps or give reliable task-level predictions. That mismatch, simple machinery versus hard-to-predict behavior, is the core puzzle.
----
Video from 'Dr Brian Keating' YT Channel (Link in comment)
A wonderful book. Unlike so many books I've read, it does not assume the reader is already into mathematics. It offers definitions, wraps chapters in biographies, and does its best to go beyond the West without neglecting it. A great gift even for precocious teenagers, I think!
Before jumping into books like DDIA or Database Internals, it helps to understand the systems layer these designs are built on.
A lot of the design of such data-intensive systems is based on virtual memory: page tables, page faults, mmap, the page cache, swapping, NUMA placement, TLBs, and the tradeoffs between what the OS wants and what the database wants.
My latest article is a ~25,000-word mini-book on virtual memory.
It starts from first principles and goes all the way down to advanced topics like NUMA placement and performance debugging with tools like perf and /proc.
I also wrote it differently: as a dialogue between a user-space process and the kernel.
Most treatments of virtual memory are dry and fact-heavy. I wanted this one to feel more like a story, while still being technically deep.
Link below.
Mastering these NLP papers places you in the top 1% in AI.
The MOST Influential: GPT, BERT, Attention, DeepSeek.
https://t.co/z3RWA6DsLb
- Word2Vec
- BERT
- GPT-2
- GPT-3
- Attention is All You Need
- Mixture of Experts (MOE)
- DeepSeekMath (with GRPO)
New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4.
I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC.
Link: https://t.co/KO81y3kTH7