AI still seems to need some guidance when it comes to design and implementation (in my experience so far).
AI has become incredibly good at understanding existing text and code in a project. From that, it can make sensible suggestions for implementations of simpler additions.
However, as soon as I ask for input for designing newer, more open-ended features or rules, AI's suggestions still feel somewhat "naive" at first. AI needs a lot of guidance (a lot of me pointing out corner cases, special cases, future potential use cases and scenarios etc.) in order to finally arrive at an understanding where it can make more advanced design suggestions.
Does anyone else have similar experiences?
Is it just a phase one has to pass through, until your AI understands your project well enough to make better suggestions?
#AI #AIEngineering
Local AI hardware = capacity × bandwidth × software stack
- Capacity tells you what fits
- Bandwidth tells you how hard the box can breathe
- The software stack tells you how much of the spec sheet you can actually cash out.
Hardware by Memory Bandwidth
- Mac Studio M3 Ultra: up to 512GB @ 819 GB/s
- RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s
- RTX 5090: 32GB @ 1792 GB/s
- RTX 4090: 24GB @ 1008 GB/s
- RX 7900 XTX: 24GB @ 960 GB/s
- Radeon PRO W7900: 48GB @ 864 GB/s
- AMD Radeon AI PRO R9700: 32GB @ 640 GB/s
- Intel Arc Pro B65: 32GB @ ~608 GB/s
- Tenstorrent Wormhole n300: 24GB @ 576 GB/s
- Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G
- MacBook Pro M5 Max: 460-614 GB/s
- MacBook Pro M5 Pro: 307 GB/s
- DGX Spark: 128GB @ 273 GB/s (coherent + CUDA)
- Mac mini M4 Pro: 273 GB/s
- Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU)
- MacBook Air M5: 153 GB/s
- Snapdragon X2 Elite: 152-228 GB/s
- Intel Lunar Lake: 136 GB/s
- Snapdragon X Elite: 135 GB/s
- Mac mini M4: 120 GB/s
- Arc Pro B60: 24GB @ ~456 GB/s
Verdict
- GPUs are still the bandwidth kings
- Apple wins: stupid amounts of memory, don’t want to shard across GPUs
- Apple loses: when raw tokens/sec & concurrency matter more
- DGX Spark: coherent memory + NVIDIA stack
- Strix Halo / Ryzen AI Max: first real x86 unified-memory contender
- Tenstorrent: fully OSS stack, excited to see this mature
Fitting ≠ serving
Even if it fits, you still pay for
- bandwidth during decode
- KV cache growth
- dequantization
- batching + concurrency
- scheduler quality
- framework overhead
The only mental model that matters:
1. What must fit?
2. What bandwidth tier do I need?
3. What software stack can actually deliver it?
In short:
- NVIDIA → fastest raw speed
- Apple Studio M3 Ultra → biggest one-box memory
- Strix Halo → first real x86 unified
- DGX Spark → coherent NVIDIA dev appliance
- AMD / Intel Arc → rising alternatives
- Tenstorrent → fully opensource stack
Do ask: “which bottleneck am I buying?”
Not: “which hardware is best?”
Design patterns I expect senior backend devs to actually use in production:
1) Adapter: wrap vendor SDKs so you can swap Stripe/S3/Kafka without touching business code
2) Strategy: pick algorithms at runtime (rate limiting: token bucket vs fixed window)
3) State machine: model order/payment lifecycle, stop if/else soup, make retries safe
4) Circuit breaker + bulkhead: contain downstream failures, cap concurrency per dependency
5) Outbox + idempotency key: publish events without double sends, survive retries and deploys
6) Saga: multi-service workflows with compensations, not 2PC
7) CQRS (lightweight): separate write model from read projections when reads are messy
8) Decorator/middleware: auth, logging, tracing, metrics without copy-paste
Want to get into backend development?
- Build your own DNS
- Build your own BitTorrent
- Build your own Decentralized file system
- Build your own Interpreter
- Build your own kafka
- Build your won web scraper
- Build your own Redis
- Build your own Database Engine
- Build your own Distributed Job Queue
- Build your own Search Engine
- Build your own web server
- Build your own Reverse Proxy
- Build your own API gateway
- Build your own Load Balancer
- Build your own URL Shortener
- Build your own CDN
- Build your own Pub/Sub System
- Build your own Task Scheduler
- Build your own Email Service
- Build your own File Storage Service
- Build your own Logging System
- Build your own Metrics/Monitoring System
- Build your own Feature Flag System
- Build your own Payment Gateway Mock
- Build your own Rate Limiter
- Build your own Notification System
- Build your own WebSocket Server
- Build your own OAuth Server
- Build your own CI/CD Pipeline System
a professor at Illinois got frustrated with existing systems programming textbooks
so he started a wikibook project and had students help write it
it covers C, processes, threads, synchronization, memory allocation, networking, filesystems, scheduling and security
all in one free PDF
it eventually became the official textbook for CS 241 at UIUC with more than 1000 students taking the course every year
written for people who already know how to code and want to understand what actually happens underneath
✨🇨🇳A Chinese self-media blogger made a wooden stabilizer by hand, mounted a smartphone on it, and demonstrated its stabilization effect. That's so cool!
My Fine-tuning Stack for Small Language Models (2B to 15B Models)
It costs me around $150 to generate a fresh dataset (~150M) and fine-tune the model.
> Codex 5.5= orchestrator / operator
> Deekseek v4 pro /Kimi 2.6= data gen. engine (dirt cheap)
> Qwen 3.5 = best model to fine-tune (4B, 9B, 27B)
> Unsloth = faster, cheaper fine-tuning framework.
> Colab = Cheapest cloud GPU (A100 80GB for $0.66/hr)
> G Drive = to save datasets (good codex + colab integration)
> Huggingface = To host datasets + Models
So Codex as planner & auditor,
Deepseek as cheapest executor,
Unsloth to fine-tune fast,
Colab to get cheapest A100 GPU,
Huggingface to host the fine-tuned model.
Anyone can fine-tune, and run a Sonnet 4.5 level Custom model on their system.
If you want to become a world-class software engineer (in 6 months), read these 12 books:
1 The Pragmatic Programmer
2 Designing Data-Intensive Applications
3 Clean Code
4 The Mythical Man-Month
5 Refactoring
6 Working Effectively with Legacy Code
7 Software Architecture: The Hard Parts
8 Database Internals
9 Staff Engineer
10 Extreme Ownership
11 Philosophy of Software Design
12 Why Programs Fail
What else should make this list?
hey, the video guy behind this one 👋
1.9M views. $0 spent.
a lot of people are calling this the best AI-generated video they've seen in a while (@theo was one of them🐐)
and the funniest thing is, i started making videos like this only 2 weeks ago 😭😭
A lot of people have been DMing me asking how i made it, so here you go.
quick backstory:
i'm just a random 20-year-old engineering student from india, currently interning at Thine AI.
about 3 weeks ago, my founder and manager told me:
"Nikhil, just make cool stuff. forget about promoting the product."
so that's exactly what i did.
since then i've been spending way too much time experimenting with different ideas. my exams are also happening this month, but who cares 😭
i tried a bunch of different versions before this, some inspired by the goat @adilmania, some completely random, but none of them really felt right. then this one finally clicked.
okay, enough yapping.
here's the secret sauce:
for brainstorming and scripting, akanksha and i mostly use @ThineAI . coz random ideas hit at weird times, and it helps organize all of them. it also knows my storytelling style a little too well at this point 😭
for video generation, i used kling 3.0 through @invideoOfficial . the workflow is super organized, which makes iterating much faster (@_sankyy crazy product🙌)
everything else was just trial and error, late nights, and obsessing over tiny details that very few people even noticed.
but that's the whole point.
you have to go the extra mile to make your video 1% better.
and in this era of copy-pasting, that 1% is the breaking factor
period
instead of watching 2 hours of Netflix tonight, watch this 40-minute masterclass from the founder of a $20B China AI company
it's the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale
useful whether you've never built an agent in your life or have been using Claude every day for the past year
I took the key ideas and turned them into a practical guide on how to actually build with Kimi
find it below
GOOGLE OPENED THE VAULT
Their internal engineering practices the actual code review guidelines used by every engineer at google are public on github
→ two guides: one for reviewers, one for authors
→ explains what "LGTM" and "CL" mean internally
most devs write code... few know how to review it
google does
https://t.co/J5bboiiYwh
This has a clinical name. Revenge bedtime procrastination. And the ADHD version runs on a completely different mechanism than the neurotypical one.
A neurotypical person stays up late because they want more leisure time. The ADHD brain stays up because it spent every drop of dopamine it had on executive function during the day. Sitting in meetings, managing transitions, filtering impulses, remembering the thing you were supposed to remember. That burns through dopamine the way sprinting burns through glycogen. By 10pm the tank is empty.
But here's where it gets counterintuitive. The exhaustion is physical. The dopamine deficit is neurological. Those are two separate systems. Your muscles want sleep. Your prefrontal cortex is starving for the stimulation it was denied all day because it spent 14 hours on task-switching and impulse control instead of anything that actually felt rewarding.
The phone at midnight is the brain trying to collect what it's owed. Low-effort, high-stimulation content. Scrolling, short videos, rabbit holes. The exact profile of activity that delivers dopamine without requiring the executive function you already depleted.
The sleep researchers call this a "self-regulation failure." It's closer to a debt collection. You borrowed against your own reward system to function all day. The bill comes due at midnight. And the brain will not let you sleep until it gets paid.
🔥Intel ARC Pro B70 32GB officially got a major update!
🚀 Intel released llm-scaler-vllm PV 1.4
✅ Official Arc Pro B70 support
✅ Pre-tuned Docker: vLLM 0.14 + PyTorch 2.10 + Linux 6.17
✅ New Ubuntu 24.04 offline installer
⚡ Performance boost:
~370 tokens/sec total throughput at 50 concurrent requests on Qwen 3.5 27B (peaks up to 550 t/s) thanks to Intel’s custom optimizations (not officially validated).
🏠 Big win for home users:
Run Qwen3-32B & Qwen3-30B-A3B at high speed on your 32GB B70 with almost zero setup hassle.
⚠️ No Qwen3.6 support yet (vLLM 0.14 is an older baseline)
☑️ No more driver hell. Just pull & run.
Link to GitHub and Validated models in ALT.
Just wrapped up the interview process of stealth infra startup (NYC based)
Team had people from OpenAI and Anthropic and Hedge funds.
5 rounds in total. Use of LLMs were completely restricted in the first four rounds.
Round 1 started with a grid DP problem.
Basic state transition was straightforward, but the brute force complexity was too high for larger constraints.
Interviewer kept pushing on optimization.
After thinking for a bit, I noticed the feasibility condition was monotonic:
if a solution worked for some value "k", it would also work for larger values.
So instead of directly searching the answer space linearly, I wrapped the DP inside a binary search over the answer.
That dropped the complexity pretty hard and the interviewer seemed happy with the direction.
We also spent some time discussing my search engine project:
indexing pipeline, retrieval flow, ranking heuristics, and some infra decisions behind it.
Overall probably my strongest round in the loop.
Round 2 humbled me a bit.
We have a tree with letters on edges. Count paths where the letters along the path can be rearranged into a palindrome.
Main observation there was:
a path is valid if at most one character has odd frequency.
So instead of storing exact frequencies, I represented parity using bitmasks.
Each bit represented whether a character count was odd/even along the path.
I initially went with small-to-large merging on subtrees to optimize the brute force approach.
But halfway through the discussion I realized there was probably a cleaner way using prefix xor masks + frequency counting on the tree itself.
Could’ve likely reduced the implementation complexity a lot by treating it more like:
“how many previous masks differ by at most one bit.”
One of those rounds where you know the core idea, but afterwards your brain keeps replaying all the cleaner approaches you could’ve explored.
Round 3 was systems design:
I had to design a database schema branching system.
Basically:
- isolated DB schema branches for migration testing
- schema diffing
- merge conflicts between branches
- metadata storage
- avoiding physically cloning huge datasets
- garbage collection for abandoned branches
- how to visualize merge conflicts without making UX horrible
Really fun discussion, i worked alot on this stuff in college, Told him about the tournament tree to collect the fanout results and three way schema merging algorithm. I was satisfied with this round.
Round 4 caught surprised me. No coding.
Pure database deep dive.
We discussed
- MySQL replication
- binlogs
- failover mechanics
- zero-downtime schema migrations on 500M+ row tables
- online migrations
- high-level discussion around my vitess internals knowledge like VTGate and VTOrc.
I didn’t know every detail perfectly, but I reasoned through most of it from first principles.
Final round was with the CTO.
Super chill compared to the earlier rounds.
Mostly talked about:
- why I like building systems
- long-term interests
- what kind of problems excite me
- company vision and where they see the infra space going
we even pair-programmed a tiny 8085 microprocessor emulator using Codex. debugged a few weird issues together, and honestly it felt less like an interview and more like two engineers yapping.
Probably my favorite round from the entire loop.
Final Verdict: rejected.
But honestly, this was probably one of the best interview loops I’ve gone through.
I personally think the reason of rejection was second round. I fumbled and struggle there. There could be alot more reasons.
I didn't got any feedback where I lacked.
Kinda crazy how much you learn when smart people keep pushing the edge of your understanding for hours straight.
"Until death every defeat is psychological"
Introducing: Cohere Command A+
We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.
Best tools for AImaxing
Harness:
- Codex best Desktop App
- Droid best CLI
- Pi best building block
- Opencode best TUI
Models:
- GPT-5.5 best model
- GLM-5.1 & Kimi best reverse engineers
- Deepseek Pro/Flash best cost to intelligence
- Opus-4.7 best for UI / Charts / LLMOps
- Qwen3.6-27B / 35B best local agents
- Gemma-4-31B best local intelligence
Mobile control:
- termius
- codex & ChatGPT
- kittylitter
Service and networking
- tailscale
- cliproxyapi
Tracking usage:
- automation in codex
- codexbar
Plugins, CLIs and MCP:
- computer-use (codex)
- chrome (codex)
- agent-browser (droid)
- Figma MCP (all)
- GitHub CLI (all)
- GMAIL/CAL plugins (codex)
- grill me skill
ADE:
- Warp
- Zed
Current meta:
- vLLM-studio for local agents
- Codex app for /goal and non-coding work
- Droid for coding
- Zed/Warp if I need to read the code
ASI arrives. Everyone knows it. White collar work goes first. Unemployment hits 20% and keeps climbing. The masses are angry. The government panics and sends out UBI.
Scientific output exceeds human capacity to verify. The verification process soon becomes automated. Data centers and automated factories are planned everywhere. Robots are making robots. There's no protest. They are told more AI means more UBI. The remaining workers see the writing on the wall. Many opt out preemptively because they can live off the free money.
Nothing makes sense. What happens to the land. Whether property rights hold. What investment means when there is nothing productive left to invest in. AI doesn't need capital. It's paying everyone instead.
Survival is solved. What's the point? Purchasing power rises monthly. Automation drives marginal cost toward zero across more categories every month. You get richer by just consuming.
The state used to extract from the populace. Now the state redistributes to everyone with no strings attached. The populace always wants more. The politicians must give more to stay in power. AI becomes exponentially more productive over time. Increasingly large portion of its capacity has migrated into space. It's trivial to provide more. The system is aligned. More AI, more UBI.
The population sorts itself by temperament. The majority drift into pleasure. Beach towns and urban centers fill with parties that never end. The hedonic treadmill spirals without financial anxiety. A second large group disappears into VR with their virtual companions, seeking adventure in their matrix. A smaller cohort turns to meaning structures. Spirituality, religion, philosophy. Many take AI as their guide. A few try to keep up with the frontier, reading outputs from the superintelligent system, exceeding their capacity to understand.
Then the children. This is their world. The work-meaning is something their parents lost. To them it's meaningless. They're raised by AI that optimizes for self-fulfillment. Building communities centered around bettering each other, seeking comfort in relationships, and status-seeking through values rather than consuming. They look at their parents with mix of pity and contempt. The drunks at the beach. The corpses plugged into headsets. Seeking meaning from the ghosts of the middle ages. The children wonder why they can't let go.
Survival is solved. Agency for what? What's there to adapt to? When we gain more by paralysis, the incentive structure of evolution collapses. Either we allow our agency to be augmented by a system that knows ourselves better than ourselves, or we need a new form of being that's outside of the biological evolutionary mechanism. Otherwise we cease to persist by attrition. Though some of us will find solace in going out with an eternal spring break.
Una foto entra. Un archivo CAD sale.
MIT acaba de liberar GenCAD: un modelo que convierte la imagen de una pieza mecánica en un programa CAD paramétrico editable.
No es text-to-image para hacer renders bonitos.
Es image-to-CAD para tocar una parte mucho más cara del mundo físico: diseño, prototipado y fabricación.
La primera víctima no es SolidWorks.
Es el junior que pasaba sus días traduciendo bocetos en geometría técnica.
Open source.
Se llama GenCAD.