I'm building an LLM inference engine from scratch, in public. Not using vLLM or TGI. The goal: re-derive every serving trick until I understand how modern inference squeezes throughput from a GPU. I call it nanoserve: https://t.co/TbCxgWkUna
Anthropic has officially filed for its IPO, cementing its place as one of the most valuable AI startups. The move lands amid loud debate over an AI bubble, setting up a defining test of how much investors will pay for frontier AI. #AI#Anthropic#Claude#IPO#LLM
Vint Cerf, the 'Father of the Internet,' is finally retiring. The co-designer of TCP/IP has spent decades as Google's chief internet evangelist, shaping the protocols every AI system now runs on. End of an era. #Internet#TCPIP#Google#Tech#AI
The Trump administration is lifting US export controls on Anthropic's Mythos and Fable AI models, reversing curbs imposed over cybersecurity concerns and clearing the way for wider international access. #AI#Anthropic#ExportControls#LLM#AIPolicy
Microsoft Copilot had a critical flaw that let attackers steal users' 2FA codes, per Ars Technica. It exposed one-time login tokens before Microsoft patched it. A reminder that AI assistants widen the attack surface. #AI#Copilot#Microsoft#InfoSec#CyberSecurity
Meta quietly launched Pocket, a mobile gaming app built largely through vibe coding, using AI to generate much of it instead of hand written code. A notable test of how far AI assisted development can go for consumer products.
#AI#Meta#VibeCoding#GenerativeAI
Notion is shutting down Notion Mail, its Skiff-influenced email app, saying most users now lean on AI agents to handle email instead of a dedicated inbox. A telling sign of how agentic tools are quietly reshaping productivity software. #AI#Notion#AIagents#Productivity#Skiff
A new startup says LLMs are stuck in a 'groupthink groove,' converging on the same safe answers and losing output diversity. Its pitch: methods to nudge models toward more varied, less homogenized responses. #AI#LLM#MachineLearning#AIResearch#Startups
Microsoft is building a bouncer for Teams: a new control that blocks unauthorized AI bots and notetaker agents from silently joining meetings as automated attendees pile up. Admins get to decide what gets a seat in the room. #AI#Microsoft#Teams#Bots#Enterprise
Oracle is cutting roughly 21,000 jobs to fund a massive, debt-fueled buildout of AI data centers and compute. The layoffs bankroll billions in capex as it leans into cloud AI infrastructure. A bold bet on AI. #AI#Oracle#Layoffs#CloudComputing#DataCenters
New platform Flare lets anyone publicly report AI systems behaving badly, flagging flaws, unsafe outputs and failures so researchers can track them. Basically a crowdsourced early warning system for AI risk. #AI#AISafety#LLM#MachineLearning#TechNews
Ashton Kutcher is leaving Sound Ventures, the firm he co-founded, to launch a new VC firm with ex-a16z partner Morgan Beller, who helped create Meta's Diem crypto project. Their fresh fund targets AI and frontier tech. #AI#VentureCapital#Startups#Tech#VC
Mark Zuckerberg reportedly told Meta staff that AI agents haven't progressed as quickly as he'd hoped. A candid admission from a CEO betting billions on AI, and a notable crack in the agent hype cycle. #AI#AIagents#Meta#Zuckerberg#MachineLearning
A new humanoid robot from Flexion is being pitched as a white collar office intern, handling routine desk work and errands with unsettling competence, per WIRED. Automation is going physical. #AI#Robotics#Humanoids#Automation#FutureOfWork
Same trick vLLM uses to pack many sequences into one shared pool. Still matches HF on Llama-3.2-1B token for token. Week 5 done. 112 tests green.
#AI#LLM#vLLM#BuildInPublic#Claude#OpenAI
Day 17 of building an LLM inference engine from scratch.
Paged KV memory now reaches the path the server actually calls: sampling. And a finished sequence hands its blocks back so the next one reuses them.
https://t.co/TbCxgWkUna
How I proved reuse instead of claiming it: one pool sized for a single sequence, handed to two runs back to back. The second only allocates because the first freed. If free-on-finish regressed, it would raise instead of quietly passing.
Report: Meta had contractors pose as teens to prompt rival chatbots about suicide, sex, and drugs, testing how competitors handle sensitive queries from minors. A messy look at how AI safety benchmarking really happens. #AI#Meta#AISafety#Chatbots#TechNews
If you want to actually understand LLMs (not just use them), read these in order:
1. Attention Is All You Need (transformers)
2. GPT-2 (scaling + zero-shot)
3. Scaling Laws (Kaplan, 2020)
4. GPT-3 (few-shot)
5. Chinchilla (how much data you actually need)
6. InstructGPT (RLHF, why ChatGPT works)
7. LoRA (fine-tuning without going broke)
8. FlashAttention (why it's fast)
9. Chain-of-Thought (reasoning)
10. DPO (RLHF without the pain)