Irving

@BlockInsight214

捕捉每一丝灵感，将天马行空落地为产品 AI Agent Builder | Bitcoin Core Dev |

Global

Joined August 2025

1.4K Following

1.6K Followers

3.1K Posts

Irving

@BlockInsight214

5 minutes ago

You can now run GLM-5.2 locally on Mac Studio to integrate with Hermes.👇 Hardware Reality (Non-Negotiable) • 2-bit dynamic quant: ~239GB • Minimum: 256GB Unified Memory (Mac Studio only, no laptops) • Recommended: 512GB Unified Memory • Speed: 1–9 tokens/sec on M3 Ultra Use case: Perfect private background worker for long async tasks Not for fast, casual interactive chat Step 1: Run GLM-5.2 Local API (2 Options) Option 1 | LM Studio (Easiest macOS Setup) 1. Install LM Studio 2. Download Unsloth GLM-5.2 GGUF (UD-IQ2_M) 3. Enable developer local server 4. Endpoint: http://localhost:1234/v1 Option 2 | Llama.cpp (Full CLI Control) pip install huggingface_hub hf download unsloth/GLM-5.2-GGUF \ --local-dir unsloth/GLM-5.2-GGUF \ --include "*UD-IQ2_M*" ./llama.cpp/llama-server \ --model unsloth/GLM-5.2-GGUF/UD-IQ2_M/GLM-5.2-UD-IQ2_M-00001-of-00006.gguf \ --temp 1.0 --top-p 0.95 --min-p 0.01 \ --ctx-size 32768 --jinja \ --host 0.0.0.0 --port 8080 (Official Unsloth sampling params + Jinja chat template for valid tool calling) Step 2: Connect Nous Hermes Agent to Local Model Edit ~/.hermes/config.yaml for fully local agent execution: model: default: glm-5.2 provider: custom base_url: http://localhost:8080/v1 api_key: local context_length: 32768 agent: tool_use_enforcement: true Key fix: Enable tool_use_enforcement GLM is not in Hermes’ default supported model list — this forces proper tool calling (no more just describing tasks!)

$BlockInsight214's tweet photo. You can now run GLM-5.2 locally on Mac Studio to integrate with Hermes.👇 Hardware Reality (Non-Negotiable) • 2-bit dynamic quant: ~239GB • Minimum: 256GB Unified Memory (Mac Studio only, no laptops) • Recommended: 512GB Unified Memory • Speed: 1–9 tokens/sec on M3 Ultra Use case: Perfect private background worker for long async tasks Not for fast, casual interactive chat Step 1: Run GLM-5.2 Local API (2 Options) Option 1 | LM Studio (Easiest macOS Setup) 1. Install LM Studio 2. Download Unsloth GLM-5.2 GGUF (UD-IQ2_M) 3. Enable developer local server 4. Endpoint: http://localhost:1234/v1 Option 2 | Llama.cpp (Full CLI Control) pip install huggingface_hub hf download unsloth/GLM-5.2-GGUF \ --local-dir unsloth/GLM-5.2-GGUF \ --include "*UD-IQ2_M*" ./llama.cpp/llama-server \ --model unsloth/GLM-5.2-GGUF/UD-IQ2_M/GLM-5.2-UD-IQ2_M-00001-of-00006.gguf \ --temp 1.0 --top-p 0.95 --min-p 0.01 \ --ctx-size 32768 --jinja \ --host 0.0.0.0 --port 8080 (Official Unsloth sampling params + Jinja chat template for valid tool calling) Step 2: Connect Nous Hermes Agent to Local Model Edit ~/.hermes/config.yaml for fully local agent execution: model: default: glm-5.2 provider: custom base_url: http://localhost:8080/v1 api_key: local context_length: 32768 agent: tool_use_enforcement: true Key fix: Enable tool_use_enforcement GLM is not in Hermes’ default supported model list — this forces proper tool calling (no more just describing tasks!)$

Irving

@BlockInsight214

about 2 hours ago

@brian_armstrong The vision is clear, but the real difficulty is building an agent that can handle real money movement, unexpected market moves, and regulatory edge cases without creating silent disasters that only surface days later.

Irving

@BlockInsight214

about 2 hours ago

I've been running similar multi-agent setups on bigger refactors and the skeptic + reviewer layer is what actually keeps things from quietly drifting into elegant but broken solutions — the real test will be how well /goal handles mid-project plan changes without the whole team losing coherence.

Irving

@BlockInsight214

about 2 hours ago

@yuhasbeentaken The token inefficiency and long planning loops are the real hidden cost — even with much lower per-token pricing, GLM-5.2 can easily end up more expensive than expected once you run actual long-horizon agent workflows that need consistent steering.

Irving

@BlockInsight214

about 2 hours ago

I'm desperately in need of GLM-5.2 right now. My current agent workflows are burning through GPT-5.5 tokens way too fast, and the monthly bill is getting out of control.

BlockInsight214's tweet photo. I'm desperately in need of GLM-5.2 right now.

My current agent workflows are burning through GPT-5.5 tokens way too fast, and the monthly bill is getting out of control. https://t.co/H21oxjEk1K

Irving

@BlockInsight214

about 2 hours ago

@Hicker_Moledao This is the exact kind of infrastructure oversight that hurts long-running local agents the most — TRACE-level logging should never be left on by default in anything meant to run for hours, or you end up trading model intelligence for hardware lifespan.

359

Irving

@BlockInsight214

about 2 hours ago

@0xgibly I've wasted too many tokens and context windows on agent hallucinations that actually traced back to messy PDF parsing — adding a proper cleaning step upstream has been one of the highest-ROI improvements in any document-heavy workflow I've run.

Irving

@BlockInsight214

about 19 hours ago

🗂️ 论文、合同、扫描件丢给 AI 之前，最难的一步往往是「先把 PDF 洗干净」。这几个开源项目专干这件事：转成 Markdown/JSON，直接喂给 RAG 或 agent。 ① 📄 MarkItDown · 微软出品，Office/PDF/图片一键转 Markdown，⭐ 15万+，格式覆盖最全 🔗 https://t.co/4jxUro8cZ6 ② 🧪 MinerU · 复杂 PDF/Office 转 LLM 友好 Markdown/JSON，⭐ 6.8万，论文和研报解析很能打 🔗 https://t.co/xBbOkMm3dN ③ 🤖 Docling · IBM 开源，文档清洗后接 gen AI pipeline，⭐ 6.1万，企业文档场景友好 🔗 https://t.co/aem1SAv5hM ④ ✂️ marker · 高精度 PDF → Markdown + JSON，⭐ 3.6万、GPL-3.0，扫描版也能救 🔗 https://t.co/36vId9A7mu ⑤ 🔍 surya · OCR + 版面分析 + 表格识别，90+ 语言，⭐ 2万、Apache-2.0，复杂排版先过它 🔗 https://t.co/7wvhvZYuU9 💡 怎么选 👉 日常 Office/PDF 快速转文本 → MarkItDown 👉 论文/研报/复杂版式 → MinerU 或 marker 👉 接企业 RAG 流水线 → Docling 👉 扫描件、表格多、排版乱 → surya 预处理 + MinerU

788

Irving

@BlockInsight214

about 2 hours ago

@_RyanFree 请回

Irving

@BlockInsight214

about 21 hours ago

停了两天会员，跑了这么多粉丝，蓝V真是试金石

Irving

@BlockInsight214

about 3 hours ago

Real agentic benchmarks like GDPval-AA matter more than most synthetic ones because they test multi-turn practical deliverables. GLM-5.2 reaching #3 here means open weights are now close enough that long-running agent workflows can start shifting for cost and control instead of pure capability.