0xNeoArch

@0xNeoArch

Software engineer • AI orchestration • Agentic AI SDLC frameworks • Local AI & hardware • Infra • DevOps • Trading • Biohacking Sharing what works

Joined September 2024

4K Following

211 Followers

599 Posts

0xNeoArch

@0xNeoArch

about 1 hour ago

Today’s goodies on video automation: OpenMontage - 987 stars today OpenMontage is the first open-source agentic video production system, with 12 pipelines, 52 tools, and 500+ agent skills that work together autonomously. Think of it as a full video production crew that runs on its own, from scripting to final render. The fact that this is fully open source makes it even more impressive. https://t.co/xLhLhzdaYp palmier-pro - 1,834 stars today Palmier Pro is a macOS video editor built specifically for AI workflows, letting you work with video the same way you prompt a model. Instead of wrestling with traditional timelines, you describe what you want and the AI handles the editing heavy lifting. This is what the next generation of creative tools looks like. https://t.co/kwRE84Wf8Z

0xNeoArch

@0xNeoArch

about 2 hours ago

🤯🤯🤯

Sakana AI

@SakanaAILabs

about 7 hours ago

Fugu stands shoulder-to-shoulder with leading models like Fable and Mythos across the industry's most rigorous engineering, scientific, and reasoning benchmarks. Read the full blog: https://t.co/2ZJbdWqCUj Beyond Bigger Models: Why are Orchestration Models the Next Frontier Progress in AI has been driven largely by giant, monolithic models. But the most powerful systems of the future will be collaborative ecosystems. Today, this orchestration is no longer just a technical optimization. It has become a geopolitical and operational imperative. For an organization or a nation, relying on a single company's model for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality. As we have seen with recent export controls imposed on models like Fable and Mythos, access can disappear overnight. Collective intelligence is the practical hedge against this concentration of power. Because Fugu orchestrates an underlying pool of swappable agents, it simply routes around vendor restrictions. By orchestrating the world’s models, we are delivering the resilient blueprint required for true AI sovereignty.

SakanaAILabs's tweet photo. Fugu stands shoulder-to-shoulder with leading models like Fable and Mythos across the industry's most rigorous engineering, scientific, and reasoning benchmarks.

Read the full blog: https://t.co/2ZJbdWqCUj

Beyond Bigger Models: Why are Orchestration Models the Next Frontier

Progress in AI has been driven largely by giant, monolithic models. But the most powerful systems of the future will be collaborative ecosystems.

Today, this orchestration is no longer just a technical optimization. It has become a geopolitical and operational imperative.

For an organization or a nation, relying on a single company's model for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality.

As we have seen with recent export controls imposed on models like Fable and Mythos, access can disappear overnight.

Collective intelligence is the practical hedge against this concentration of power. Because Fugu orchestrates an underlying pool of swappable agents, it simply routes around vendor restrictions.

By orchestrating the world’s models, we are delivering the resilient blueprint required for true AI sovereignty.

220

703

438K

0xNeoArch retweeted

孤桜ETH

@GYLQ520

1 day ago

Hermes 的社区生态最近彻底炸了，各路程序员已经把它玩出了各种骚操作。说真的，一个工具能不能长久，不看官方怎么吹，就看社区在不在搞事情。Hermes 现在的状态就是——全网开发者已经自发给它造了一套完整的“平行宇宙”，从创作到剪视频到画图到写作到写代码，全链路都有人在填坑。你以为 Hermes 只是个 Agent 框架？不，它现在是一个生态了。来，逐个拆给你看： 1️⃣ skill-autoshorts 🔗 https://t.co/TU648OTH3h… 全自动短视频流水线，长视频丢进去，自动剪高光片段，推到手机让你审批一下，过了就直接发布，还会每周复盘 engagement 数据自我进化。说白了就是 Agent 帮你运营 TikTok 和 Reels，你只需要偶尔点个确认。这条赛道以后人工剪辑真的要失业了。 2️⃣ codex-image 🔗 https://t.co/swrrAdEGyC… 本地跑的图像生成插件，零 API key，Telegram 和微信直接出图。不用交钱、不用联网、不用担心数据跑到别人服务器上。隐私敏感的人和想省钱的人，这个方案直接拿走用。 3️⃣ humanizer 🔗 https://t.co/kK4YWCaFuQ… 29 条模式自动去除 AI 味，还能把文风校准成你自己的腔调。现在 AI 写的东西一眼就被识别出来，这个工具就是专门解决这个问题的。写报告、写文案、写邮件，过一遍出来，真的不像机器写的了。 4️⃣ vscode-acp 🔗 https://t.co/YxQPZ34Y1f… VS Code 原生扩展，支持多个 Hermes CLI 切换，skills 直接在编辑器里管理。开发者不用再来回跳终端了，Agent 直接住进你的编辑器，写代码的同时 Agent 在旁边候着。一个框架，四个方向，全是社区自己搞出来的。这才是一个工具真正活着的样子。

GYLQ520's tweet photo. Hermes 的社区生态最近彻底炸了，各路程序员已经把它玩出了各种骚操作。

说真的，一个工具能不能长久，不看官方怎么吹，就看社区在不在搞事情。Hermes 现在的状态就是——全网开发者已经自发给它造了一套完整的“平行宇宙”，从创作到剪视频到画图到写作到写代码，全链路都有人在填坑。

你以为 Hermes 只是个 Agent 框架？不，它现在是一个生态了。

来，逐个拆给你看：

1️⃣ skill-autoshorts
🔗 https://t.co/TU648OTH3h…
全自动短视频流水线，长视频丢进去，自动剪高光片段，推到手机让你审批一下，过了就直接发布，还会每周复盘 engagement 数据自我进化。说白了就是 Agent 帮你运营 TikTok 和 Reels，你只需要偶尔点个确认。这条赛道以后人工剪辑真的要失业了。
2️⃣ codex-image
🔗 https://t.co/swrrAdEGyC…
本地跑的图像生成插件，零 API key，Telegram 和微信直接出图。不用交钱、不用联网、不用担心数据跑到别人服务器上。隐私敏感的人和想省钱的人，这个方案直接拿走用。
3️⃣ humanizer
🔗 https://t.co/kK4YWCaFuQ…
29 条模式自动去除 AI 味，还能把文风校准成你自己的腔调。现在 AI 写的东西一眼就被识别出来，这个工具就是专门解决这个问题的。写报告、写文案、写邮件，过一遍出来，真的不像机器写的了。

4️⃣ vscode-acp
🔗 https://t.co/YxQPZ34Y1f…
VS Code 原生扩展，支持多个 Hermes CLI 切换，skills 直接在编辑器里管理。开发者不用再来回跳终端了，Agent 直接住进你的编辑器，写代码的同时 Agent 在旁边候着。
一个框架，四个方向，全是社区自己搞出来的。这才是一个工具真正活着的样子。

229

345

15K

0xNeoArch retweeted

hoeem

@hooeem

about 14 hours ago

https://t.co/xTA9Mdowhi

124

208

26K

0xNeoArch retweeted

Parth Jadhav

@ParthJadhav8

about 17 hours ago

It's quite crazy that this third party app has 10x more features & integrations than official Whoop app. + Works without the subscription💸 + Is Open Source https://t.co/VfhQWPogKJ

341

262

64K

0xNeoArch retweeted

Jun Song

@jun_song

1 day ago

Releasing models soon on @huggingface : - SuperGemma4-12b-abliterated - SuperMiniMax-M3-abliterated - SuperGLM-5.2-abliterated Cybersecurity dataset versions will not be released due to regulation issues. You will also be able to use them on decentralized inference @c0mputeAI

719

407

47K

0xNeoArch retweeted

Ahmad

@TheAhmadOsman

about 22 hours ago

Local AI hardware = capacity × bandwidth × software stack - Capacity tells you what fits - Bandwidth tells you how hard the box can breathe - The software stack tells you how much of the spec sheet you can actually cash out. Hardware by Memory Bandwidth - Mac Studio M3 Ultra: up to 512GB @ 819 GB/s - RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s - RTX 5090: 32GB @ 1792 GB/s - RTX 4090: 24GB @ 1008 GB/s - RX 7900 XTX: 24GB @ 960 GB/s - Radeon PRO W7900: 48GB @ 864 GB/s - AMD Radeon AI PRO R9700: 32GB @ 640 GB/s - Intel Arc Pro B65: 32GB @ ~608 GB/s - Tenstorrent Wormhole n300: 24GB @ 576 GB/s - Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G - MacBook Pro M5 Max: 460-614 GB/s - MacBook Pro M5 Pro: 307 GB/s - DGX Spark: 128GB @ 273 GB/s (coherent + CUDA) - Mac mini M4 Pro: 273 GB/s - Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU) - MacBook Air M5: 153 GB/s - Snapdragon X2 Elite: 152-228 GB/s - Intel Lunar Lake: 136 GB/s - Snapdragon X Elite: 135 GB/s - Mac mini M4: 120 GB/s - Arc Pro B60: 24GB @ ~456 GB/s Verdict - GPUs are still the bandwidth kings - Apple wins: stupid amounts of memory, don’t want to shard across GPUs - Apple loses: when raw tokens/sec & concurrency matter more - DGX Spark: coherent memory + NVIDIA stack - Strix Halo / Ryzen AI Max: first real x86 unified-memory contender - Tenstorrent: fully OSS stack, excited to see this mature Fitting ≠ serving Even if it fits, you still pay for - bandwidth during decode - KV cache growth - dequantization - batching + concurrency - scheduler quality - framework overhead The only mental model that matters: 1. What must fit? 2. What bandwidth tier do I need? 3. What software stack can actually deliver it? In short: - NVIDIA → fastest raw speed - Apple Studio M3 Ultra → biggest one-box memory - Strix Halo → first real x86 unified - DGX Spark → coherent NVIDIA dev appliance - AMD / Intel Arc → rising alternatives - Tenstorrent → fully opensource stack Do ask: “which bottleneck am I buying?” Not: “which hardware is best?”

TheAhmadOsman's tweet photo. Local AI hardware = capacity × bandwidth × software stack

- Capacity tells you what fits
- Bandwidth tells you how hard the box can breathe
- The software stack tells you how much of the spec sheet you can actually cash out.

Hardware by Memory Bandwidth
- Mac Studio M3 Ultra: up to 512GB @ 819 GB/s
- RTX PRO 6000 Blackwell: 96GB @ 1792 GB/s
- RTX 5090: 32GB @ 1792 GB/s
- RTX 4090: 24GB @ 1008 GB/s
- RX 7900 XTX: 24GB @ 960 GB/s
- Radeon PRO W7900: 48GB @ 864 GB/s
- AMD Radeon AI PRO R9700: 32GB @ 640 GB/s
- Intel Arc Pro B65: 32GB @ ~608 GB/s
- Tenstorrent Wormhole n300: 24GB @ 576 GB/s
- Tenstorrent Blackhole p150: 32GB @ 512 GB/s + 800G
- MacBook Pro M5 Max: 460-614 GB/s
- MacBook Pro M5 Pro: 307 GB/s
- DGX Spark: 128GB @ 273 GB/s (coherent + CUDA)
- Mac mini M4 Pro: 273 GB/s
- Ryzen AI Max / Strix Halo: ~256 GB/s (~96GB usable GPU)
- MacBook Air M5: 153 GB/s
- Snapdragon X2 Elite: 152-228 GB/s
- Intel Lunar Lake: 136 GB/s
- Snapdragon X Elite: 135 GB/s
- Mac mini M4: 120 GB/s
- Arc Pro B60: 24GB @ ~456 GB/s

Verdict

- GPUs are still the bandwidth kings

- Apple wins: stupid amounts of memory, don’t want to shard across GPUs
- Apple loses: when raw tokens/sec & concurrency matter more

- DGX Spark: coherent memory + NVIDIA stack

- Strix Halo / Ryzen AI Max: first real x86 unified-memory contender

- Tenstorrent: fully OSS stack, excited to see this mature

Fitting ≠ serving

Even if it fits, you still pay for
- bandwidth during decode
- KV cache growth
- dequantization
- batching + concurrency
- scheduler quality
- framework overhead

The only mental model that matters:

1. What must fit?
2. What bandwidth tier do I need?
3. What software stack can actually deliver it?

In short:
- NVIDIA → fastest raw speed
- Apple Studio M3 Ultra → biggest one-box memory
- Strix Halo → first real x86 unified
- DGX Spark → coherent NVIDIA dev appliance
- AMD / Intel Arc → rising alternatives
- Tenstorrent → fully opensource stack

Do ask: “which bottleneck am I buying?”

Not: “which hardware is best?”

206

156K

0xNeoArch retweeted

Thomas Wolf

@Thom_Wolf

1 day ago

Desert island survival list: ✅ Solar panel / battery ✅ 256 GB Mac Studio ✅ GLM 5.2 Civilization in a backpack

268

37K

0xNeoArch retweeted

Akshay 🚀

@akshay_pachaar

2 days ago

Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skips HTML parsing completely. Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels. Why that matters: parsing is where web RAG quietly loses information. - A single HTML-to-text parser can drop 40%+ of a page. - Tables, charts, and layout get flattened or thrown out. - Swapping parsers alone can move accuracy ~10 points on the same docs. PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA. The repo also ships a Claude Code plugin that gives Claude eyes. It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like. One setup script. No MCP server, no backend. How the pipeline works: - Renders each document (web, PDF, image) to image tiles. - Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots. - Builds a FAISS index and serves a search API. A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels. Everything is open-source under Apache-2.0. GitHub repo: https://t.co/qun9TjAdmw Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x. The article is quoted below.

117

786

11K

767K

0xNeoArch retweeted

Daniel Nguyen

@daniel_nguyenx

3 days ago

Don’t use Kindle to run codex. Try a Boox instead. It runs Android OS, so you can just use the official ChatGPT app. My top 3 E Ink devices for Codex / ChatGPT: - Note Air 5c - Go 7 B/W - Palma 2 B/W Note Air 5C: - Pros: Large screen, very fast refresh rate, smooth scrolling, minimal flashing/ghosting, great for notes, docs, planning, ChatGPT, Codex output - Cons: darker screen because it has kaleido color layer. Not a big deal, but it’s not as crisp as the Kindle. It’s pretty pricey ~$529 A more portable option is Boox Go 7 (reader shape) or Palma 2 (phone shape). Software-wise it’s the same as the Note Air 5C but you have a monochrome display → crispy text and better battery life. Demo: - Left window: codex in official ChatGPT app - Right: browser tab with serve-sim so I can see changes in iOS simulator - Input: no keyboard needed. I use an AI dictation keyboard and it works very well. For some reason the built-in voice input in the ChatGPT app never work. Someone from codex android team please fix 🥲

daniel_nguyenx's tweet photo. Don’t use Kindle to run codex. Try a Boox instead.

It runs Android OS, so you can just use the official ChatGPT app.

My top 3 E Ink devices for Codex / ChatGPT:

- Note Air 5c
- Go 7 B/W
- Palma 2 B/W

Note Air 5C:

- Pros: Large screen, very fast refresh rate, smooth scrolling, minimal flashing/ghosting, great for notes, docs, planning, ChatGPT, Codex output

- Cons: darker screen because it has kaleido color layer. Not a big deal, but it’s not as crisp as the Kindle. It’s pretty pricey ~$529

A more portable option is Boox Go 7 (reader shape) or Palma 2 (phone shape).

Software-wise it’s the same as the Note Air 5C but you have a monochrome display → crispy text and better battery life.

Demo:

- Left window: codex in official ChatGPT app
- Right: browser tab with serve-sim so I can see changes in iOS simulator
- Input: no keyboard needed. I use an AI dictation keyboard and it works very well.

For some reason the built-in voice input in the ChatGPT app never work. Someone from codex android team please fix 🥲

16K

0xNeoArch retweeted

Jun Song

@jun_song

2 days ago

The biggest side hustle trend for late 2026: Setting up a small data center rack in your garage and selling AI inference. Bookmark this post and come back to it later.

291

216

19K

0xNeoArch

@0xNeoArch

2 days ago

Best Claude update ever 😂😂😂

0xNeoArch retweeted

0xSero

@0xSero

2 days ago

They called me crazy. Zai is cracked beyond belief

670

171K

0xNeoArch retweeted

0xSero

@0xSero

2 days ago

Let me make your life better. Sit down and turn all your work in goals/loops, connect your phone with codex or dispatch/code or droid computer Go outside and enjoy the nature. If you’re curious drop in and give guidance Example For web developers: - /goal go over every single feature in this app create a user story with expected behaviour based on the code keep a single canonical spreadsheet tracking the features status - when done switch loop to testing every user story and documenting all errors - when done fix every logistical error or ux error - test every user behaviour again post fix Give the model computer use, and give it full permissions on some computer. ——— Example for researchers: - /loop create a continuous workflow to research documents and tests related to expert pruning and merging - for each of those create a plan that can run on 2 GPUs do a full experiment which helps increase retained expert saliency post pruning - once done run 50 tests against previous prompts which lead to model degeneracy, attractor collapse, and overthinking - at temperature 0 measure if any improvements were made without hurting success on coding - you have 8x B200s which must never be idle, curate a list of experiments to do constantly every cycle if GPUs are idle and I’ve not given guidance run and maintain an experiment.

0xSero's tweet photo. Let me make your life better.

Sit down and turn all your work in goals/loops, connect your phone with codex or dispatch/code or droid computer

Go outside and enjoy the nature. If you’re curious drop in and give guidance

Example For web developers:

- /goal go over every single feature in this app create a user story with expected behaviour based on the code keep a single canonical spreadsheet tracking the features status
- when done switch loop to testing every user story and documenting all errors
- when done fix every logistical error or ux error
- test every user behaviour again post fix

Give the model computer use, and give it full permissions on some computer.

———

Example for researchers:

- /loop create a continuous workflow to research documents and tests related to expert pruning and merging
- for each of those create a plan that can run on 2 GPUs do a full experiment which helps increase retained expert saliency post pruning
- once done run 50 tests against previous prompts which lead to model degeneracy, attractor collapse, and overthinking
- at temperature 0 measure if any improvements were made without hurting success on coding
- you have 8x B200s which must never be idle, curate a list of experiments to do constantly every cycle if GPUs are idle and I’ve not given guidance run and maintain an experiment.

850

33K

0xNeoArch

@0xNeoArch

3 days ago

@loktar00 Will GLM 5.2 Q2_xxxs run smoothly without offloading with 8x or 10x3090s?

736

0xNeoArch retweeted

0xJeff

@0xJeff

28 days ago

https://t.co/7saJAy1VNw

645

115K

0xNeoArch retweeted

AVB

@AvbNear

3 days ago

Crazy if true! The challenge with 'decentralised' AI is that very rarely does one provider (individuals) have a powerful enough machine/GPUs to serve the large models that are actually in demand By being able to Shard the model and aggregate compute across several providers, you enable both larger models to be served and smaller providers to contribute in a decentralised way Not the first ones to have this vision, but most progress I've seen on this front so far. Kudos

11K

0xNeoArch retweeted

0xSero

@0xSero

3 days ago

Exo No more worrying about it. Best model best config best harness for your hardware based on local . ai benchmarks just: exo run

340

279

42K

0xNeoArch

@0xNeoArch

3 days ago

DeusData/codebase-memory-mcp (+2,308 stars today) AI coding agents are terrible at understanding large codebases because they read files one by one and run out of context fast. This tool indexes your entire project into a persistent knowledge graph in milliseconds, so agents can trace function calls and dependencies in a single query instead of scanning thousands of files. One team cut their token usage by 99% just by switching to it. https://t.co/IEl630ODmc

0xNeoArch

@0xNeoArch

Last Seen Users on Sotwe

Trends for you

Most Popular Users