Today’s goodies on video automation:
OpenMontage - 987 stars today
OpenMontage is the first open-source agentic video production system, with 12 pipelines, 52 tools, and 500+ agent skills that work together autonomously. Think of it as a full video production crew that runs on its own, from scripting to final render. The fact that this is fully open source makes it even more impressive.
https://t.co/xLhLhzdaYp
palmier-pro - 1,834 stars today
Palmier Pro is a macOS video editor built specifically for AI workflows, letting you work with video the same way you prompt a model. Instead of wrestling with traditional timelines, you describe what you want and the AI handles the editing heavy lifting. This is what the next generation of creative tools looks like.
https://t.co/kwRE84Wf8Z
Fugu stands shoulder-to-shoulder with leading models like Fable and Mythos across the industry's most rigorous engineering, scientific, and reasoning benchmarks.
Read the full blog: https://t.co/2ZJbdWqCUj
Beyond Bigger Models: Why are Orchestration Models the Next Frontier
Progress in AI has been driven largely by giant, monolithic models. But the most powerful systems of the future will be collaborative ecosystems.
Today, this orchestration is no longer just a technical optimization. It has become a geopolitical and operational imperative.
For an organization or a nation, relying on a single company's model for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality.
As we have seen with recent export controls imposed on models like Fable and Mythos, access can disappear overnight.
Collective intelligence is the practical hedge against this concentration of power. Because Fugu orchestrates an underlying pool of swappable agents, it simply routes around vendor restrictions.
By orchestrating the world’s models, we are delivering the resilient blueprint required for true AI sovereignty.
It's quite crazy that this third party app has 10x more features & integrations than official Whoop app.
+ Works without the subscription💸
+ Is Open Source
https://t.co/VfhQWPogKJ
Releasing models soon on @huggingface :
- SuperGemma4-12b-abliterated
- SuperMiniMax-M3-abliterated
- SuperGLM-5.2-abliterated
Cybersecurity dataset versions will not be released due to regulation issues.
You will also be able to use them on decentralized inference @c0mputeAI
Local AI hardware = capacity × bandwidth × software stack
- Capacity tells you what fits
- Bandwidth tells you how hard the box can breathe
- The software stack tells you how much of the spec sheet you can actually cash out.
Hardware by Memory Bandwidth
- Mac Studio M3 Ultra: up to 512GB @ 819 GB/s
- RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s
- RTX 5090: 32GB @ 1792 GB/s
- RTX 4090: 24GB @ 1008 GB/s
- RX 7900 XTX: 24GB @ 960 GB/s
- Radeon PRO W7900: 48GB @ 864 GB/s
- AMD Radeon AI PRO R9700: 32GB @ 640 GB/s
- Intel Arc Pro B65: 32GB @ ~608 GB/s
- Tenstorrent Wormhole n300: 24GB @ 576 GB/s
- Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G
- MacBook Pro M5 Max: 460-614 GB/s
- MacBook Pro M5 Pro: 307 GB/s
- DGX Spark: 128GB @ 273 GB/s (coherent + CUDA)
- Mac mini M4 Pro: 273 GB/s
- Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU)
- MacBook Air M5: 153 GB/s
- Snapdragon X2 Elite: 152-228 GB/s
- Intel Lunar Lake: 136 GB/s
- Snapdragon X Elite: 135 GB/s
- Mac mini M4: 120 GB/s
- Arc Pro B60: 24GB @ ~456 GB/s
Verdict
- GPUs are still the bandwidth kings
- Apple wins: stupid amounts of memory, don’t want to shard across GPUs
- Apple loses: when raw tokens/sec & concurrency matter more
- DGX Spark: coherent memory + NVIDIA stack
- Strix Halo / Ryzen AI Max: first real x86 unified-memory contender
- Tenstorrent: fully OSS stack, excited to see this mature
Fitting ≠ serving
Even if it fits, you still pay for
- bandwidth during decode
- KV cache growth
- dequantization
- batching + concurrency
- scheduler quality
- framework overhead
The only mental model that matters:
1. What must fit?
2. What bandwidth tier do I need?
3. What software stack can actually deliver it?
In short:
- NVIDIA → fastest raw speed
- Apple Studio M3 Ultra → biggest one-box memory
- Strix Halo → first real x86 unified
- DGX Spark → coherent NVIDIA dev appliance
- AMD / Intel Arc → rising alternatives
- Tenstorrent → fully opensource stack
Do ask: “which bottleneck am I buying?”
Not: “which hardware is best?”
Web scraping will never be the same.
(100% open-source visual search at scale)
PixelRAG is a retrieval system that skips HTML parsing completely.
Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels.
Why that matters: parsing is where web RAG quietly loses information.
- A single HTML-to-text parser can drop 40%+ of a page.
- Tables, charts, and layout get flattened or thrown out.
- Swapping parsers alone can move accuracy ~10 points on the same docs.
PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA.
The repo also ships a Claude Code plugin that gives Claude eyes.
It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like.
One setup script. No MCP server, no backend.
How the pipeline works:
- Renders each document (web, PDF, image) to image tiles.
- Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots.
- Builds a FAISS index and serves a search API.
A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels.
Everything is open-source under Apache-2.0.
GitHub repo: https://t.co/qun9TjAdmw
Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x.
The article is quoted below.
Don’t use Kindle to run codex. Try a Boox instead.
It runs Android OS, so you can just use the official ChatGPT app.
My top 3 E Ink devices for Codex / ChatGPT:
- Note Air 5c
- Go 7 B/W
- Palma 2 B/W
Note Air 5C:
- Pros: Large screen, very fast refresh rate, smooth scrolling, minimal flashing/ghosting, great for notes, docs, planning, ChatGPT, Codex output
- Cons: darker screen because it has kaleido color layer. Not a big deal, but it’s not as crisp as the Kindle. It’s pretty pricey ~$529
A more portable option is Boox Go 7 (reader shape) or Palma 2 (phone shape).
Software-wise it’s the same as the Note Air 5C but you have a monochrome display → crispy text and better battery life.
Demo:
- Left window: codex in official ChatGPT app
- Right: browser tab with serve-sim so I can see changes in iOS simulator
- Input: no keyboard needed. I use an AI dictation keyboard and it works very well.
For some reason the built-in voice input in the ChatGPT app never work. Someone from codex android team please fix 🥲
The biggest side hustle trend for late 2026:
Setting up a small data center rack in your garage and selling AI inference.
Bookmark this post and come back to it later.
Let me make your life better.
Sit down and turn all your work in goals/loops, connect your phone with codex or dispatch/code or droid computer
Go outside and enjoy the nature. If you’re curious drop in and give guidance
Example For web developers:
- /goal go over every single feature in this app create a user story with expected behaviour based on the code keep a single canonical spreadsheet tracking the features status
- when done switch loop to testing every user story and documenting all errors
- when done fix every logistical error or ux error
- test every user behaviour again post fix
Give the model computer use, and give it full permissions on some computer.
———
Example for researchers:
- /loop create a continuous workflow to research documents and tests related to expert pruning and merging
- for each of those create a plan that can run on 2 GPUs do a full experiment which helps increase retained expert saliency post pruning
- once done run 50 tests against previous prompts which lead to model degeneracy, attractor collapse, and overthinking
- at temperature 0 measure if any improvements were made without hurting success on coding
- you have 8x B200s which must never be idle, curate a list of experiments to do constantly every cycle if GPUs are idle and I’ve not given guidance run and maintain an experiment.
Crazy if true!
The challenge with 'decentralised' AI is that very rarely does one provider (individuals) have a powerful enough machine/GPUs to serve the large models that are actually in demand
By being able to Shard the model and aggregate compute across several providers, you enable both larger models to be served and smaller providers to contribute in a decentralised way
Not the first ones to have this vision, but most progress I've seen on this front so far. Kudos
DeusData/codebase-memory-mcp (+2,308 stars today)
AI coding agents are terrible at understanding large codebases because they read files one by one and run out of context fast. This tool indexes your entire project into a persistent knowledge graph in milliseconds, so agents can trace function calls and dependencies in a single query instead of scanning thousands of files. One team cut their token usage by 99% just by switching to it.
https://t.co/IEl630ODmc