Got into running AI models on my own hardware, testing, building agents, and sharing what actually works. Follow for daily insights, hardware tips, and no-BS tech talk (+ an additional rant from time to time).
Interesting, I'll give them that, but I wonder if it's too little too late. I hope it isn't, cause I actually want to see Windows on ARM get good, provided MS doesn't screw it up.
🚨BREAKING: Microsoft just unveiled the Surface Laptop Ultra and it is the first Windows PC built to go directly at the MacBook Pro.
This is not another Windows laptop. It runs an NVIDIA ARM chip with RTX Spark graphics and up to 128GB of unified memory. The same unified memory architecture that made Apple Silicon untouchable is now sitting inside a Windows machine.
It is built specifically for developers who want to run AI agents locally without touching the cloud. That is the exact use case Apple has owned for the last three years.
The design is ultrafine. The specs are legitimate. The timing is deliberate.
Microsoft watched Apple take the premium laptop market for years and built a direct answer. Surface Laptop Ultra launches this fall.
Apple was used to being the only serious option in this space.
That changes now.
New model from the Google team, high performance with only 12B parameters, fits in 8 to 16gb of vram.
The first haft of 2026 is really cooking, imagine what the second half will bring!
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
@TheGalox_ Isn't 100w a bit too low? The Spark is rated at 240w, even though real world is more la 140 - 170w, so if they got close to 100w what did the cut? I wonder if Nvidia purposefully crippled the performance to keep it under the DGX.
AI video editors can't edit what isn't indexed. Learn how a developer used Gemma 4 31B locally on a 5-year-old laptop to process and index a year of raw, unlabeled video, making it fully searchable. A great look at building local-first tools.
Cloud API costs rising + rate limits throttling production = local LLM inference isn't idealistic - it's a necessity and it's economical.
Run the numbers: your 3090 running Qwen3.6 or Gemma locally vs. API calls.
If we scale in hardware, we also scale in performance.
NVIDIA JetPack 7.2 + NemoClaw on Jetson means you can deploy physical AI agents with local processing on robotics/IoT today. No more latency, no more privacy leaks.
AI agent robots near us soon.
#Jetson#EdgeAI
I'm wondering what chaos will ensue after the EU AI Act provisions make their entry in August. Who do you think gets fined first?
Some people want to innovate, others want to regulate...
@PabloSabbatella Then the next one should be public and without warning. They say all that crap about responsible disclosure and bug bounty programs, but nothing. It's really getting old.
This video shows the latest progress: better framerate, particles, animations, lighting, NPCs, AI, player UI, armor pickup effects, mipmaps for improved performance and visuals, and more. Here's how Half-Life is shaping up on a Nokia N95
#Nokia#Symbian#Valve#Steam#HalfLife
AI model terms 101:
Quantization or quant is the process of compressing an LLM to reduce the compute and memory requirements, by converting all the numerical values from high precision data to lower precision data (for example from 16bit to 4bit).
You then have a model which is much smaller and needs much less resources, but the trade-off is that the process introduces approximation errors, which reduce the quality of the output and overall reasoning capabilities.
By end of 2026, I predict:
- "Local AI" won't be a niche anymore, it'll be the first choice for developers
- Agents will start to become mainstream
- Enterprises will find themselves forced to adopt self-hosted models because of privacy and the need to eliminate shadow IT.
- The "GPU arms race" will shift from datacenters to desk setups
- Cloud API costs will go down as competition heats up
The next 6 months will be wild, and I can't wait to take part.
#AI #FutureTech #OpenSource
The next evolution of Hermes Agent is here!
Introducing Hermes Desktop: everything you love about Hermes, now native on your machine.
First demoed in Jensen's GTC keynote, it's now in public preview.
Pro tip for running local LLMs faster:
Use GGUF quantized models (Q4_K_M or Q5_K_M).
You get ~95% of full precision quality with 3-4x less VRAM usage.
A 70B model that won't fit in your 24GB card? Now it does.
Your wallet and your GPU will thank you.
#LocalLLM#Optimization #AI
@JohnnyNel_ Exactly, today one of my pipelines broke, but I quickly found the problem. The first time that happened it wasn't so quick, cause I didn't have measures in place. Hardway indeed.
Given that many people will be using harnesses like Hermes Agent in the very near future, it will be highly important to have ways to identify what went wrong in an automated pipeline. Put logs in your pipeline, checks also, have one subagent check flows. If something does fail, you and your agent can quickly identify the problem, instead of digging hours to see what happened.