passer

@passer_81

Joined April 2009

959 Following

54 Followers

3.6K Posts

passer_81 retweeted

向阳乔木

@vista8

about 22 hours ago

想写职场、武侠、修仙等任意风格小��？可自己完全没有思路，能创作吗？必须可以！今天开源一个乔木小说创作 Skill。你只需说：“我想写一个小说” 或 “想写一个类似xxx的小说”。 AI自动给出剧情梗概，人物设定，还能把钩子、经典桥段、人物欲望、冲突升级和结尾自动处理好。跟AI讨论没问题后，再生成完整、低 AI ��的小说。小说 Skill 安装： npx skills add joeseesun/qiaomu-novel-generator Github免费开源，地址见评论区

vista8's tweet photo. 想写职场、武侠、修仙等任意风格小��？

可自己完全没有思路，能创作吗？必须可以！

今天开源一个乔木小说创作 Skill。

你只需说：“我想写一个小说” 或 “想写一个类似xxx的小说”。

AI自动给出剧情梗概，人物设定，还能把钩子、经典桥段、人物欲望、冲突升级和结尾自动处理好。

跟AI讨论没问题后，再生成完整、低 AI ��的小说。

小说 Skill 安装：
npx skills add joeseesun/qiaomu-novel-generator

Github免费开源，地址见评论区

122

181

10K

passer_81 retweeted

番茄哈猫🍐 @zuilizhishier

about 9 hours ago

假如你是一个韩国人，周末见证韩国国家队取得本届世界杯首胜，拿起手机一看韩国股市指数涨了8%，今年累计涨幅已达300%李在明宣布要把三星海力士的超额利润平分给你。人们听着 kpop，工作四天后，是三天端午小长假，大家聊的都是三星海力士世界杯。这个时候，你发现有个国家的人民在说你吃不起西瓜。

154

106

128K

passer_81 retweeted

LinearUncle

@LinearUncle

about 6 hours ago

每次国产新模型发布新版本，很多朋友不信邪，一定要浪费宝贵时间和精力去一探究竟。没有必要，外国友人会帮助你测试，他们无法收买，哪天大量说英语的X友们连连称赞，那才说明国产模型真正崛起了。否则，永远建议每个人每个月��钱只购买OpenAI或Anthropic的顶格套餐，Token自由+顶格智力，才有资格探索AI的无限可能，否则永远只是看客。

556

passer_81 retweeted

Iggie🚁

@Kenntnis22

about 21 hours ago

“春秋笔法”，这是中宣部储备人才啊😄

185

849

155

167K

Who to follow

CTFer, Student, (Rev & Pwn & Automation), DEFCON 33 Finalist Exploring System Security and Porgram Analysis. Hacking with @r3kapig and @S1uM4i for fun.

passer_81 retweeted

about 17 hours ago

🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → https://t.co/VWp818MB3D 🔹 LFM2.5-8B-A1B (LiquidAI) • Hybrid MoE, only 1B active params — absurdly fast for its size • Perfect for 8-12GB cards, MacBooks, or anyone on a tight budget • GGUF → https://t.co/ZbOs4mXJDq ━━━ 16-32GB VRAM ━━━ 🔹 Qwen3.6-27B (Qwen) • Scored 1.00 on tool-efficiency benchmarks — best local agent available • 40 deterministic tasks, 32k/128k context needle tests — all passed • GGUF → https://t.co/n7K3sPvliE • MTP version (faster) → https://t.co/gwdfnJTzcy 🔹 Qwopus3.6-27B-v2 (Jackrong) • Best quantization of Qwen3.6-27B — topped 5 agent & coding benchmarks (1200 samples) • If you're running Q4, this is the one to grab • GGUF → https://t.co/tV1DFqXnOD • MTP version → https://t.co/PMqz7V5ewv 🔹 Gemma 4-31B QAT (Google/Unsloth) • QAT variant with MTP draft head: 76-125 tok/s (1.67× speedup) • Excellent for multi-agent / subagent workflows • GGUF → https://t.co/FgVsUX0YOB 🔹 Nex-N2-Mini (Nex AGI) • Post-train of Qwen3.5-35B-A3B — MoE with only 3B active params • Fits on 16GB+ VRAM, overflow loads from system RAM • Adaptive thinking saves ~20% tokens with no quality loss • For deep multi-step reasoning, nothing in this size comes close • GGUF → https://t.co/oyC522a8Eh ━━━ Quick Picks ━━━ • 16GB all-rounder → Gemma 4-12B with MTP GGUFs • 32GB all-rounder → Qwen3.6-27B / Qwopus-v2 • Agents & tool use → Qwen3.6-27B or Qwopus Q4 • Deep reasoning → Nex-N2-Mini (MoE, fits 16GB+) • Tight budget → LFM2.5-8B-A1B • Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500 ━━━ Setup on Windows ━━━ 1. Download llama.cpp → https://t.co/et0J7Swua7 (latest .zip) 2. Extract to any folder (e.g. C:\llama.cpp) 3. Download a .gguf from the links above (Q4_K_M or Q5_K_M for best quality/speed balance) 4. Run one of the commands below depending on your hardware ━━━ Launch Commands ━━━ SINGLE GPU — Standard model (no MTP): llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja SINGLE GPU — MTP model (faster inference): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU — Split across two cards: llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ --tensor-split 0.55,0.45 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU + MTP + Vision (multimodal): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ --tensor-split 0.60,0.40 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja ^ --mmproj C:\models\mmproj-F16.gguf ━━━ Parameter Breakdown ━━━ -m <path> Path to your .gguf model file. Change this to wherever you downloaded it. --ctx-size 180000 Context window in tokens. 180k = huge context for long conversations or big codebases. Reduce to 32768 or 65536 if you don't need long context — uses less VRAM. --flash-attn on Flash Attention — dramatically speeds up inference and reduces VRAM usage. Works on RTX 30xx/40xx/50xx. Always enable this. --cache-type-k q4_0 / --cache-type-v q4_0 Quantizes the KV cache (key/value attention cache) to 4-bit. This is what makes 180k context fit in VRAM. Without it, huge contexts eat all your memory. Quality impact is minimal — this is a free performance win. --batch-size 1024 / --ubatch-size 512 batch-size = how many tokens are processed in one forward pass (throughput). ubatch-size = micro-batch actually sent to the GPU per step. Higher = faster prompt processing but needs more VRAM. If you run out of VRAM, lower these (e.g. 512/256). -ngl 100 Number of layers to offload to GPU. 100 = all layers on GPU (full offload). This is what you want if the model fits in your VRAM. If it doesn't fit, reduce this (e.g. -ngl 40) — remaining layers run on CPU/RAM. --tensor-split 0.55,0.45 How to split model layers across multiple GPUs. Values are ratios. 0.55,0.45 = GPU 0 gets 55% of layers, GPU 1 gets 45%. Adjust based on your VRAM — give more to the card with more memory. Example: 0.70,0.30 for a 24GB + 12GB setup. Not needed for single GPU setups. --main-gpu 0 Which GPU handles the batch computation (the "orchestrator"). Set to 0 (your primary GPU). The other GPU(s) handle their assigned layers. Minor performance impact — usually just leave it at 0. -np 1 Number of parallel slots (concurrent requests). 1 = one user at a time. Increase to 2-4 if you want multiple clients connected simultaneously. Each extra slot uses additional VRAM for its own KV cache. --port 8080 Which port the server listens on. Change if port 8080 is busy. --jinja Enables Jinja2 template processing — required for proper chat formatting. Most modern models expect this. Always include it. --spec-type draft-mtp Enables Multi-Token Prediction (MTP) speculative decoding. Only works with MTP GGUF models (downloaded separately). The model predicts multiple tokens at once and verifies them — big speed boost. --spec-draft-n-max 3 How many tokens the MTP draft head proposes per step. 3 is a good default. Higher = potentially faster but more VRAM and may reduce quality. --mmproj <path> Path to the multimodal projector file (for vision models). Enables image understanding — paste screenshots into the web chat. Only needed if you want vision capabilities. Omit for text-only use. ━━━ Your Hardware → Your Command ━━━ Single GPU (8-24GB VRAM): Use the "Single GPU" command. Change -m to your model path. 8GB card → Gemma 4-12B Q4 or LFM2.5-8B 12GB card → Gemma 4-12B Q5/Q6 16GB card → Gemma 4-31B QAT Q4 or Nex-N2-Mini 24GB card → Qwen3.6-27B Q4/Q5, Qwopus-v2, Gemma 4-31B QAT Q5/Q6 Dual GPU: Use the "Dual GPU" command. Adjust --tensor-split based on your VRAM ratio. 24GB + 24GB → --tensor-split 0.50,0.50 24GB + 12GB → --tensor-split 0.70,0.30 24GB + 8GB → --tensor-split 0.75,0.25 Want speed? Use MTP versions of models with the "MTP" commands. Want vision? Add --mmproj with the projector file from the model's HuggingFace repo. 5. Once running, you get: • Web chat UI → http://localhost:8080 • OpenAI-compatible API → http://localhost:8080/v1 • Playground → http://localhost:8080/playground ━━━ Why /v1 API Is the Killer Feature ━━━ One local endpoint replaces your entire cloud API bill. The /v1 endpoint is drop-in OpenAI-spec compatible — every tool that speaks OpenAI just works. No custom code, no glue layer. Works out of the box with: • IDEs: Cursor, Continue, Windsurf, Cline, Roo Code • CLI tools: aider, Open Interpreter, OpenCode • Frameworks: LangChain, LlamaIndex, LiteLLM • Any OpenAI SDK (Python, Node, Go, Rust) Why this beats cloud APIs: • 100% private — code never leaves your machine • $0 per token — no rate limits, no quotas, no surprise bills • Works fully offline • Zero telemetry, no training on your data • Swap models by dropping in a different .gguf — no app changes needed • Run 32k–128k context windows without burning money Good combos: • Cursor + Qwopus-v2 → near-frontier quality, zero API cost • Continue + Qwen3.6-27B → best local coding agent �� aider + Gemma 4-12B MTP → 162 tok/s, feels instant • OpenCode + Nex-N2-Mini → deep reasoning on 16GB Set any OpenAI-compatible client to your local endpoint: set OPENAI_API_KEY=sk-dummy (any non-empty string works) set OPENAI_BASE_URL=http://localhost:8080/v1 # every OpenAI-compatible tool now hits your local GPU Shoutouts: @0xSero @rS_alonewolf @witcheer @UnslothAI @LottoLabs

134

178K

passer_81 retweeted

Viking

@vikingmute

about 11 hours ago

昨天抽空试了一下这个方法论，真的不错，不过我不是 audit，而是做新 feature，我让 GPT-5.5 High 用这个 skill 出 plan，不写一行代码，有Metadata， Scope 和 Steps。让 Composer 2.5（X 订阅送的）和 DeepSeek v4 pro 分别出了一版本实现，效果都不错，花费非常少。如果对代码质量要求更高，我再用我自己的 review-forge https://t.co/lDHbd5Y9Je review 一下，就可以实现花小钱办大事。是一个非常好的省钱穷鬼工作流。

passer_81 retweeted

Geek Lite

@QingQ77

about 9 hours ago

给中国大陆居民提供一套从券商开户到资金进出境的美股合规实操指南，绕过 CRS 信息交换。 https://t.co/herpamMgSv

633

passer_81 retweeted

Geek

@geekbb

1 day ago

pi + DeepSeek 画的，才发现这个技能不需要生图模型，是通过 LLM 将自然语言描述转为结构化 JSON → Node.js 渲染器用纯几何算法生成 SVG → 注入自包含 HTML。 https://t.co/ClTZDcWZdr

302

471

46K

passer @passer_81

about 8 hours ago

https://t.co/aEKKxV4GHz

passer_81 retweeted

cr3ghost

@cr3ghost

2 days ago

One of the best FREE Windows exploit development and security research blogs out there. Kernel pool exploitation. PTE overwrites. HVCI and kernel CFG bypass. XFG internals. Browser type confusion. Kernel shadow stacks. Secure kernel internals. ARM64 Pointer Authentication bypass. ETW and PPL research. Covers everything from ROP fundamentals all the way to cutting edge ARM64 and VBS security research. Still actively publishing in 2026. https://t.co/tyfevXiWOp Author: @33y0re #ExploitDevelopment #WindowsInternals #ReverseEngineering

cr3ghost's tweet photo. One of the best FREE Windows exploit development and security research blogs out there. Kernel pool exploitation. PTE overwrites. HVCI and kernel CFG bypass. XFG internals. Browser type confusion. Kernel shadow stacks. Secure kernel internals. ARM64 Pointer Authentication bypass. ETW and PPL research.

Covers everything from ROP fundamentals all the way to cutting edge ARM64 and VBS security research. Still actively publishing in 2026.

https://t.co/tyfevXiWOp

Author: @33y0re

#ExploitDevelopment #WindowsInternals #ReverseEngineering

484

101

449

20K

passer_81 retweeted

javinpaul @javinpaul

2 days ago

I Found LeetCode for Software Design and It’s Awesome https://t.co/QIFbtLbO2c

733

39K

passer_81 retweeted

Jack Wotherspoon

@JackWoth98

3 days ago

https://t.co/e6enxfTNnQ

422

630

128K

passer_81 retweeted

lidang 立党（劝人卖房/学CS/买SP500/纳100/OpenAI/Anthrop第一人）

@lidangzzz

1 day ago

143

105

311K

passer_81 retweeted

InfiCheesy無限芝士

@InfiCheesy

2 days ago

https://t.co/pB56gjpx1i

699

119

296K

passer_81 retweeted

Mengxin Liu

@liumengxinfly

2 days ago

发现之前对 MTP(Multi-Token Prediction) 理解有错误，我之前以为是一次预测多个 Token 会降低整体的计算成本，但实际上在长输出过程中，输出的多个 Token 会变成输入，也还是要走一遍 Transformer，一次预测多个并不会降低计算量，反而由于多了一个 MTP 模块和验证模块计算量会上升。MTP 的主要好处一个是迫使模型主动思考更长距离的预测，模型的能力会��到提升。另一个是可以多个Token并行，降低整体的延迟。

passer_81 retweeted

Ramp Labs

@RampLabs

3 days ago

When measuring effectiveness versus cost, the frontier presents as a tradeoff rather than a single winner. Read our methodology and explore the results below: https://t.co/InjYdQeD1B

RampLabs's tweet photo. When measuring effectiveness versus cost, the frontier presents as a tradeoff rather than a single winner.

Read our methodology and explore the results below:
https://t.co/InjYdQeD1B https://t.co/WsQBpvrKx5

106

55K

passer_81 retweeted

Trail of Bits

@trailofbits

3 days ago

RSA private keys biased toward 0 bits can be factored by swapping a hard math problem for an easy one: integer factorization becomes polynomial factorization. We found hundreds of real-world keys vulnerable to this. Many traced to a type mismatch in CompleteFTP (now patched): each 32-bit limb got only 8 bits of randomness. We recovered 603 RSA and 74 DSA private keys. https://t.co/C2jcxVW9WG

810

164

415

50K

passer_81 retweeted

LinearUncle

@LinearUncle

1 day ago

@dotey 宝哥文章写的太好了，@browser @chrome基本上是我日常使用最多的功能，一些朋友嫌弃codex app卡，只用codex CLI，无意中失去了使用这么好用内置浏览器的功能。前两天我也分享了使用@chrome + codex的developer mode让codex自动抓包分析deepseek加载聊天记录逻辑。 https://t.co/i32txJf58J

passer @passer_81

2 days ago

@oran_ge 好像关心人类命运，其实是为了做广告

passer_81 retweeted

Klaith @Klaith

2 days ago

美国肿瘤协会（ACS）2026 年关于结直肠癌正常风险个人筛选的指南，读完发现最大的问题是其中的新方法国内未必可及，旧方法国内也未必可及，或者生产商众多，质量参差不齐。不过重温评估疾病筛选的基本原则，也不算浪费时间。 https://t.co/mVEVQ0pV73

passer

@passer_81

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users