A podcast where I had a wonderful time discussing a spectrum of topics on Product Management:
- some non-obvious Aha moments in Product career
- early career level-up map & hacks
- experiment hacks
- 0-to-1 journeys & scale-ups
- B2B vs B2C flavours in PM
https://t.co/ApQvHEytzP
The noise from AI marketing hype seems to feel like tuning into the static or white noise in TV / radio broadcasts.
Too many overlapping frequencies and interference.
There is transmission strength, but indiscernible "signal".
Wouldn't matter much if these were only intellectual debates.
There are real economic & massive capital reallocation decisions being taken with far reaching ripple effects.
@Nate_Keating Is it a fair assumption to make that eventually macOS AI developers would be a far bigger segment (and maybe more relevant too) to target than those with a dedicated GPU/TPU access?
Earlier my full tank used to cost ₹3300.
Now the same tank costs ₹3567.
That’s already ₹267 extra.
But the real damage is mileage.
Earlier:
16.5 kmpl average
Now:
11.5 kmpl average
If I drive 1,650 km a month:
Earlier fuel needed:
1,650 ÷ 16.5 = 100 litres
Now fuel needed:
1,650 ÷ 11.5 = 143 litres
Extra fuel consumption:
43 litres more every month.
At around ₹101.90/litre,
monthly extra expense = ₹4,382 more.
So effectively:
₹3300 tank became ₹3567
+
₹4382 extra monthly fuel burn because mileage crashed.
Middle class is paying more money to travel the same distance.
@Prince_Canuma@ukrroot@GoogleDeepMind What are the other advantages of using a Diffusion model if the inference speed is only at parity with AR on Apple Silicon?
I should have read this footnote in the blog before downloading the model and building the llama.cpp target PR that supports DiffusionGemma -- "Note: Because this speedup relies on exploiting the high arithmetic intensity of accelerators, unified-memory architectures like those in Apple Silicon Macs — which are often memory-bandwidth-bound rather than compute-bound during inference — may not see the same acceleration over autoregressive models like Gemma 4."
The Diffusion version performs worse than AR version (same quantity, same context, same input+prompts) on the Apple M1 Max.
I got 3.5x times slower Linference speed for an approx 6k context (3.5-4k input, 1.5-2k output).
Also, Diffusion gave an erroneous EoS emission and generation halted mid-way.
The relative quality of generated analysis was also inferior than the AR version.
A unique aspect of David Grusch's public speeches is the use of euphemism and clinical terminology infused with bureaucratic jargon leaning towards academic grandiloquence.
David Grusch says the US Government is aware of several non-human species, including "corporeal bipedal and sentient plasma" forms of NHI. #ufox#ufotwitter
Even half-decent coding models can now reverse-engineer minified JS files to find the full, non-documented gRPC APIs, payload structures and authentication methods after burning thousands of tokens.
I gave my local model a book (20.6 MB PDF) and asked it to generate chapter summaries.
The total context budget was 128k.
The book was converted into a 126.4k token context & a summary of 1.6k token was generated - fully utilising the 128k context.
Just so happened to fit neatly in the total context budget.
It took ~10 min to ingest the book and ~1.5 min to generate the summary.
Over the years, my views about what would change has changed - We are so embedded in the rigmarole of our lives & systems that surround us that I doubt much would change practically immediately merely from information disclosure.
Perhaps, a limit of my imagination or sufficient "institutionalization". 😀
Steven Spielberg: "Real UFO & Alien disclosure will cause ontological shock" 👽🛸
"This will cause turn people's reality upside down"
The iconic Disclosure Day director said the revelation that we are not alone could have a profound impact on humanity
My current intuition about coding using local AI setup:
The hardware constraints you have create a fan-out effect.
They directly influence the model choice, which in turn creates more failure modes.
In my experiments, I put a hard limit on working within a hardware constraint, and not upgrading to a better+expensive spec.
As a result, I have to:
- curtail my choice of model & hence the capabilities I get to work with
- compensate for the model's lack of capabilities by clever software hacks (break the process into a pipeline os smaller-scope tasks, divide & conquer between main/small model, context management, pipeline observability & resilience etc.)
The model capability compensation track requires elaborate profiling for continuously making fixes via regular deterministic code.
What you save in hardware, you pay in the time to find failure modes & write software to cover for those.
What will ease this trade-off: new local AI models, which are as capable (or close enough) as the frontier models in not only "intelligence" but hygiene like following instructions, tool calling etc, but fit within the hardware constraints.
The custom software written to tackle the failure modes gets absorbed in the new model version.
#LocalAI #Coding
I have been testing Qwen3.6-35B-A3B vs Opus 4.7 by running a comprehensive code review skill (provided by "compound engineering" plugin from @every ) using both models on a large PR (includes code + tests).
I asked Opus 4.7 to compare the findings of Qwen3.6 against its own.
Key findings:
• Qwen3.6 was able to find several issues that Opus 4.7 missed
• Qwen3.6 inflated the risk and priority of identified issues
• Qwen 3.6 missed a few critical and several long tail issues identified by Opus 4.7
But, the most telling find is:
Qwen3.6 missed bugs that required a trace across multiple files, which Opus 4.7 was able to thread together. It is a model limitation of Qwen3.6, which has 1/30th of Opus 4.7 params and a 1/4th context window.
I can tweak the prompt of the skill to perhaps cover for some misses that Qwen3.6 made.
But difficult to cover the multi-file trace limitation.
#LocalAI #Qwen
@OnlyTerp Curious how to make use of Gemma4-12B 's coding powers while also contending with its tool-calling weakness? Agentic coding of any decent complexity now needs extensive tool calls, right?
Local AI is moving super fast.
If Dec 2025 was an eye-opening moment for the wider community based on what Opus 4.5 could do, Dec 2026 will be about what local AI can achieve.
Ran a quick benchmark on my Apple M1 Max (32 GB unified RAM).
• Very comfortably loaded in the RAM (I used Q4_K_M model weight & f16 KV cache quantisation)
• Pre-fill speed = 270-300 tps cold start, 150 tps at 70k context
• Inference speed = 30 tps cold start, 12-13 tps at 70k context
This is ~40% slower than Qwen3.6-28B-Reap20-A3B (20% pruned version of Qwen3.6-35B-A3B model) I have been running, as it is effectively denser (12B vs A3B approximately).
But arguably more capable due to the params density.
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Exporting data from a fitness tracker? How about we make it a treasure hunt?
Here's the path in my app:
App > Profile > Settings > Personal Info Security & Privacy > Exercising User Rights > Export Data