Pramoth

@ImMrMoth

Programmer,Java,Spring

Joined October 2009

1.3K Following

48 Followers

1.7K Posts

ImMrMoth retweeted

James Grugett

@jahooma

9 days ago

Announcing 100% free MiniMax M3 with a blazing fast provider. - Up to 200 tokens/sec 🔥 - It's so good we had to crown it "RECOMMENDED"! Try now, it's free: npm i -g freebuff

604

468

43K

ImMrMoth retweeted

Fabio Guzman

@FGuzmanAI

14 days ago

56,000+ tokens/sec at just 80 MHz. 🤯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU) Just pure digital silicon running @karpathy microGPT, spelling out names on a tiny LCD. This is GateGPT 👇

168

577

734K

ImMrMoth retweeted

Unsloth AI

@UnslothAI

16 days ago

Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️ MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs + Guide: https://t.co/c4gAUlb6YE

UnslothAI's tweet photo. Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️

MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss.

Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.

GGUFs + Guide: https://t.co/c4gAUlb6YE https://t.co/hElqKNTyCn

258

219K

ImMrMoth retweeted

Macaron Official

@Macaron0fficial

19 days ago

Introducing Macaron-V1-Preview: 749B MoL Frontier Agent Model. Experience：https://t.co/1HA0dEjx8m Open Source: https://t.co/IHA18jgQDD Blog: https://t.co/5kIXGKEmMQ

Macaron0fficial's tweet photo. Introducing Macaron-V1-Preview: 749B MoL Frontier Agent Model.

Experience：https://t.co/1HA0dEjx8m
Open Source: https://t.co/IHA18jgQDD
Blog: https://t.co/5kIXGKEmMQ https://t.co/HAGEw4Uji0

228

122

33K

Who to follow

Wannaphong Phatthiyaphaibun

@wannaphong_p

PhD student at VISTEC, Thailand. I am a open source developer. @PyThaiNLP Founder

dorn

@dorndornd

Interested in quantitative trading and computer science

Antonio Julián Arrieta Cuartero

@Toni_Arrieta_C

Aprendiendo cosas nuevas todos los días para mantener mi cerebro activo. Me gusta Clojure y ClojureScript. También Ruby, Haskell y Smalltalk.

ImMrMoth retweeted

Artificial Analysis

@ArtificialAnlys

25 days ago

Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy. The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese. See more details below ⬇️

ArtificialAnlys's tweet photo. Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier

MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy.

The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese.

See more details below ⬇️

615

252

48K

ImMrMoth retweeted

stevibe

@stevibe

25 days ago

Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together. (The test: place each element at the right pixel position on a blank form image, not type into a field.) Setup: > Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool). > I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height. > The blue boxes on the screen are its detections. Look how tight they are — it nails every field. Result: > Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct. > Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas. > Character-box alignment still a touch loose, but every value is where it belongs. > 9m10s, 224.5k input, 24.3k output, 21 turns. Why it matters: > Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can. > A combination of small models can do the work of a single large one.

275

148K

ImMrMoth retweeted

Dan Kornas

@DanKornas

28 days ago

Training an LLM from scratch is easier to study when the whole path is in one repo. Train LLM From Scratch is a PyTorch repository for learning how a transformer language model is built, trained, saved, and used for text generation. It helps you move from “I understand attention on paper” to a runnable training pipeline by pairing model code with data download, preprocessing, config, training, and generation scripts. Key features: • Transformer components from scratch – separate PyTorch modules for MLP, attention, transformer blocks, and the final model • Pile-based data path – scripts download The Pile files and preprocess JSONL.ZST text into tokenized HDF5 datasets • Configurable training setup – model size, context length, heads, blocks, batch size, learning rate, and file paths live in https://t.co/zuPqaR3MhP • Hardware guidance – README compares common GPUs for 13M and 2B-class training runs • Generation workflow included – generate_text.py loads trained checkpoints and produces sample text outputs It’s open-source (MIT license). Link in the reply 👇

DanKornas's tweet photo. Training an LLM from scratch is easier to study when the whole path is in one repo.

Train LLM From Scratch is a PyTorch repository for learning how a transformer language model is built, trained, saved, and used for text generation.

It helps you move from “I understand attention on paper” to a runnable training pipeline by pairing model code with data download, preprocessing, config, training, and generation scripts.

Key features:

• Transformer components from scratch – separate PyTorch modules for MLP, attention, transformer blocks, and the final model
• Pile-based data path – scripts download The Pile files and preprocess JSONL.ZST text into tokenized HDF5 datasets
• Configurable training setup – model size, context length, heads, blocks, batch size, learning rate, and file paths live in https://t.co/zuPqaR3MhP
• Hardware guidance – README compares common GPUs for 13M and 2B-class training runs
• Generation workflow included – generate_text.py loads trained checkpoints and produces sample text outputs

It’s open-source (MIT license).

Link in the reply 👇

201

45K

ImMrMoth retweeted

Vaishnavi

@_vmlops

28 days ago

HUGGING FACE DROPPED A FREE CONTEXT ENGINEERING COURSE and the curriculum is stacked: ▫️ unit 1: agent skills + SKILL.md format ▫️ unit 2: MCP (model context protocol) ▫️ unit 3: plugins for tool distribution ▫️ unit 4: subagents + multi-agent workflows ▫️ unit 5: hooks to guard the agent lifecycle ▫️ bonus: build your own agent from scratch https://t.co/1HjjaXVOek

559

876

32K

ImMrMoth retweeted

Dami-Defi

@DamiDefi

about 1 month ago

Claude Code cannot read 300 files at once. So someone built a system that lets it control NotebookLM from the terminal instead. The results are wild. Here is the full workflow nobody is talking about: The Setup → Claude Code connects to NotebookLM via a command line interface → Claude searches YouTube, finds relevant videos, uploads them as sources automatically → NotebookLM processes up to 300 sources simultaneously and returns cited, grounded answers → Everything syncs back into your Obsidian vault with passage-level citations you can click to verify Why This Changes Research Forever → No more 20 browser tabs you never close → No more copy-pasting outputs into random notes → No more hallucinated answers with no sources to back them up → 60% of citations verified as strong matches in accuracy audits - answers are grounded in real data What Claude Can Do From the Terminal → Search YouTube for relevant videos on any topic and rank by relevance → Create a new NotebookLM notebook and add 20 sources in parallel automatically → Ask questions and export cited answers directly into Obsidian with wikilinks → Set custom personas per notebook - concise, no filler, no preamble → Generate audio overviews and save them as MP3 files into your vault → Build mind maps, flashcard decks, and research dashboards from your sources → Search arXiv for academic papers and feed them directly into NotebookLM → Upload competitor blog posts, podcast episodes, PDFs, and your own vault notes The Obsidian Output → Every answer arrives with clickable citations that link to the exact passage in the source video or article → Graph view shows connections between all 20 sources and the topics they share → Q&A log tracks every question asked and the grounded response received → Source dashboard shows citation frequency, topics extracted, and which questions each source answered Use Cases Worth Building Today → Academic research with arXiv papers, full citation traceability → Competitor analysis from their YouTube channels and blog posts → Company knowledge base for onboarding, new employees ask NotebookLM instead of interrupting teammates → Podcast research, feed 4-hour Lex Fridman episodes and ask what's new in AI this week → Personal second brain, 300 daily notes uploaded and queryable in one notebook Before this system existed you needed 20 tabs, hours of manual reading, and no guarantee the answers were real. Now you type one prompt in the terminal and Claude does all of it for you. The research stack of 2026 is not a browser. It is a terminal connected to everything

176

252K

ImMrMoth retweeted

Lian Lim | Dashboard & AI Automation Expert

@dashboardlim

about 1 month ago

I just finished creating a guide that connects NotebookLM + Antigravity Spent 67 hours creating this system that turns your knowledge base into an AI agent that actually takes action BONUS: Complete guide for building 10 workflows + copy-paste prompts Like & Comment "FREE" and I'll DM it to you as fast as i can

dashboardlim's tweet photo. I just finished creating a guide that connects NotebookLM + Antigravity

Spent 67 hours creating this system that turns your knowledge base into an AI agent that actually takes action

BONUS: Complete guide for building 10 workflows + copy-paste prompts

Like & Comment "FREE" and I'll DM it to you as fast as i can

217

134K

ImMrMoth retweeted

divyansh tiwari

@DivyanshT91162

about 1 month ago

NO WAY. they just trained a 1B AI model with a framework written entirely by AI agents 🤯 MiniCPM just dropped MiniCPM-5 1B — a browser-runnable model that punches way above its size. → runs locally on CPU → works inside your browser → beats Qwen3.5-0.8B + LFM2.5-1.2B on math, code, and reasoning → trained with an AI-generated framework, almost zero human-written infrastructure → reportedly 10% faster than NVIDIA Megatron the craziest part? we’re entering the era where AI is starting to build the systems that train the next generation of AI. not “AI assistants.” actual recursive AI engineering. Repo 👇

ImMrMoth retweeted

Afiz ⚡️

@itsafiz

about 1 month ago

How to boost LLM accuracy by 2-10x without training? OptiLLM is an OpenAI-compatible inference proxy that boosts LLM accuracy by 2-10x on math, coding, and reasoning tasks. No training. No fine-tuning. Just proxy your API calls through it. A step-by-step guide 🧵👇

itsafiz's tweet photo. How to boost LLM accuracy by 2-10x without training?

OptiLLM is an OpenAI-compatible inference proxy that boosts LLM accuracy by 2-10x on math, coding, and reasoning tasks.

No training. No fine-tuning.
Just proxy your API calls through it.

A step-by-step guide 🧵👇 https://t.co/jHXjnlmYV0

ImMrMoth retweeted

Justin Skycak

@justinskycak

about 1 month ago

The Science of Learning Math (and Anything Else) [2:10] My background: growing up in a non-technical family and finding math on my own. [5:45] Self-studying 3,000 hours of college math in high school: starting with calculus the summer after 10th grade and continuing through undergraduate-level math for the rest of high school. [16:10] Whether the same ground could have been covered more efficiently -- and how being responsible for other people's learning eventually crystallized the underlying principles. [29:55] How having math foundations in place paid off in research: getting into Fermilab and CERN research projects at university labs. [43:10] What the Math Academy learning system looks like: adaptive diagnostic, custom knowledge graph, minimum effective doses of instruction followed immediately by problem-solving, mastery before advancing. [47:34] How we built the knowledge graph: years of manual work by domain experts, refined with analytics for nearly a decade. [1:10:46] How the FIRE (Fractional Implicit REpetition) algorithm works: solving a harder problem implicitly reviews the sub-skills it encompasses, compressing the review pile significantly. [1:35:50] Math and sport. Cognitive science principles -- mastery before advancing, spaced practice, interleaving -- are often easier to see in sport than in math. [1:42:00] Does doing math well require different skills than teaching it well? [1:56:25] Automaticity as a prerequisite for deeper understanding. [2:05:35] The anatomy of "aha" moments. [2:14:11] Learning math as an adult: the amount of work doesn't change, only your free time does. Math Academy's Mathematical Foundations sequence covers the prerequisite stack for university math in roughly 15,000 minutes. [2:24:10] Balancing fundamentals and exploration: exploration pays off most at the frontier of a subject. [2:33:55] Is it ever too late? [2:46:00] Bottom-up versus top-down learning. [2:56:30] Students with ADHD often feel the effects of inefficient pedagogy more strongly. Interleaving minimum effective doses of guided instruction and active problem-solving is better for everyone. [3:06:20] AI tools as a multiplier on existing ability: the more you know, the more useful they are; the less you know, the harder it is to detect when they've gone wrong. [3:14:37] What I'm most focused on right now: taking Math Academy from workshop to factory -- producing courses at scale without sacrificing quality.

justinskycak's tweet photo. The Science of Learning Math (and Anything Else)

[2:10] My background: growing up in a non-technical family and finding math on my own.

[5:45] Self-studying 3,000 hours of college math in high school: starting with calculus the summer after 10th grade and continuing through undergraduate-level math for the rest of high school.

[16:10] Whether the same ground could have been covered more efficiently -- and how being responsible for other people's learning eventually crystallized the underlying principles.

[29:55] How having math foundations in place paid off in research: getting into Fermilab and CERN research projects at university labs.

[43:10] What the Math Academy learning system looks like: adaptive diagnostic, custom knowledge graph, minimum effective doses of instruction followed immediately by problem-solving, mastery before advancing.

[47:34] How we built the knowledge graph: years of manual work by domain experts, refined with analytics for nearly a decade.

[1:10:46] How the FIRE (Fractional Implicit REpetition) algorithm works: solving a harder problem implicitly reviews the sub-skills it encompasses, compressing the review pile significantly.

[1:35:50] Math and sport. Cognitive science principles -- mastery before advancing, spaced practice, interleaving -- are often easier to see in sport than in math.

[1:42:00] Does doing math well require different skills than teaching it well?

[1:56:25] Automaticity as a prerequisite for deeper understanding.

[2:05:35] The anatomy of "aha" moments.

[2:14:11] Learning math as an adult: the amount of work doesn't change, only your free time does. Math Academy's Mathematical Foundations sequence covers the prerequisite stack for university math in roughly 15,000 minutes.
[2:24:10] Balancing fundamentals and exploration: exploration pays off most at the frontier of a subject.

[2:33:55] Is it ever too late?

[2:46:00] Bottom-up versus top-down learning.

[2:56:30] Students with ADHD often feel the effects of inefficient pedagogy more strongly. Interleaving minimum effective doses of guided instruction and active problem-solving is better for everyone.

[3:06:20] AI tools as a multiplier on existing ability: the more you know, the more useful they are; the less you know, the harder it is to detect when they've gone wrong.

[3:14:37] What I'm most focused on right now: taking Math Academy from workshop to factory -- producing courses at scale without sacrificing quality.

979

132

91K

ImMrMoth retweeted

Shabnam Parveen

@shabnam_774

about 1 month ago

Instead of watching Netflix tonight, watch this Stanford lecture. It explains how ChatGPT and Claude are actually built. Stanford released it free. Bookmark this.

408

640

51K

ImMrMoth retweeted

Rohan Paul

@rohanpaul_ai

about 1 month ago

Terence Tao says the math behind today’s LLMs is actually simple. Training and running them mostly uses linear algebra, matrix multiplication, and a bit of calculus, material an undergraduate can handle. We understand how to build and operate these models. The real mystery is why they work so well on some tasks and fail on others, and why we cannot predict that in advance. We lack good rules for forecasting performance across tasks, so progress is largely empirical. A key reason is the nature of real-world data. Pure noise is well understood, perfectly structured data is well understood, but natural text sits in between, partly structured and partly random. Mathematics for that middle regime is thin, similar to how physics struggles at meso-scales between atoms and continua. Because of this gap, we can describe the mechanisms but cannot yet explain capability jumps or give reliable task-level predictions. That mismatch, simple machinery versus hard-to-predict behavior, is the core puzzle. ---- Video from 'Dr Brian Keating' YT Channel (Link in comment)

568

573K

ImMrMoth retweeted

Tom Dörr

@tom_doerr

about 1 month ago

Hands-on LLM engineering course with 8 weekly projects https://t.co/3fQjuDHMMh

506

109

644

23K

ImMrMoth retweeted

Orel Beilinson @BeilinsonOrel

about 1 month ago

A wonderful book. Unlike so many books I've read, it does not assume the reader is already into mathematics. It offers definitions, wraps chapters in biographies, and does its best to go beyond the West without neglecting it. A great gift even for precocious teenagers, I think!

BeilinsonOrel's tweet photo. A wonderful book. Unlike so many books I've read, it does not assume the reader is already into mathematics. It offers definitions, wraps chapters in biographies, and does its best to go beyond the West without neglecting it. A great gift even for precocious teenagers, I think! https://t.co/FodYVKiT6Q

198

646

42K

ImMrMoth retweeted

Abhinav Upadhyay

@abhi9u

about 1 month ago

Before jumping into books like DDIA or Database Internals, it helps to understand the systems layer these designs are built on. A lot of the design of such data-intensive systems is based on virtual memory: page tables, page faults, mmap, the page cache, swapping, NUMA placement, TLBs, and the tradeoffs between what the OS wants and what the database wants. My latest article is a ~25,000-word mini-book on virtual memory. It starts from first principles and goes all the way down to advanced topics like NUMA placement and performance debugging with tools like perf and /proc. I also wrote it differently: as a dialogue between a user-space process and the kernel. Most treatments of virtual memory are dry and fact-heavy. I wanted this one to feel more like a story, while still being technically deep. Link below.

abhi9u's tweet photo. Before jumping into books like DDIA or Database Internals, it helps to understand the systems layer these designs are built on.

A lot of the design of such data-intensive systems is based on virtual memory: page tables, page faults, mmap, the page cache, swapping, NUMA placement, TLBs, and the tradeoffs between what the OS wants and what the database wants.

My latest article is a ~25,000-word mini-book on virtual memory.

It starts from first principles and goes all the way down to advanced topics like NUMA placement and performance debugging with tools like perf and /proc.

I also wrote it differently: as a dialogue between a user-space process and the kernel.

Most treatments of virtual memory are dry and fact-heavy. I wanted this one to feel more like a story, while still being technically deep.

Link below.

172

88K

ImMrMoth retweeted

Rohit Kumar Tiwari

@_rohit_tiwari_

about 1 month ago

Mastering these NLP papers places you in the top 1% in AI. The MOST Influential: GPT, BERT, Attention, DeepSeek. https://t.co/z3RWA6DsLb - Word2Vec - BERT - GPT-2 - GPT-3 - Attention is All You Need - Mixture of Experts (MOE) - DeepSeekMath (with GRPO)

_rohit_tiwari_'s tweet photo. Mastering these NLP papers places you in the top 1% in AI.

The MOST Influential: GPT, BERT, Attention, DeepSeek.

https://t.co/z3RWA6DsLb

- Word2Vec
- BERT
- GPT-2
- GPT-3
- Attention is All You Need
- Mixture of Experts (MOE)
- DeepSeekMath (with GRPO) https://t.co/txhArhPHk1

224

219

ImMrMoth retweeted

Sebastian Raschka

@rasbt

about 1 month ago

New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC. Link: https://t.co/KO81y3kTH7

rasbt's tweet photo. New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4.

I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC.

Link: https://t.co/KO81y3kTH7 https://t.co/wTx51QpQu4

426

123K

Pramoth

@ImMrMoth

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users