#llama_cpp - Twitter Hashtag

5 days ago

Just got llama.cpp running on IQ9 — fully on-device, no cloud! Running Qwen3-0.6B Q4_0 (442MB) via the Hexagon HTP NPU backend: - Prefill: 430 tok/s (pp512) - Generation: 32 tok/s (tg128) #llama_cpp #Qualcomm #EdgeAI #OnDeviceAI #Snapdragon

WutTony's tweet photo. Just got llama.cpp running on IQ9 — fully on-device, no cloud!

Running Qwen3-0.6B Q4_0 (442MB) via the Hexagon HTP NPU backend:
- Prefill: 430 tok/s (pp512)
- Generation: 32 tok/s (tg128)

#llama_cpp #Qualcomm #EdgeAI #OnDeviceAI #Snapdragon https://t.co/YWysHzMsRE

1

0

69

Geek Terminal @Geek_Terminal_x

10 days ago

llama.cppに衝撃の噂。シェル内蔵で常識崩壊？超自律AI「Mythos」の戦慄と、無音でAIを操る「AudioHijack」の脅威。エンジニアが震える技術の最前線、激変の裏側を暴く。詳細は動画で！ #GeekTerminal #AI #llama_cpp

1

0

26

Geek Terminal @Geek_Terminal_x

13 days ago

VRAM12GBで35Bモデルが110tok/s!?ik_llama.cppが物理限界を突破。ミドルGPUで巨大AIが爆速駆動する理由は？夕方の深掘り動画で全貌を解明。絶対に見逃すな！ #GeekTerminal #AI #llama_cpp #Qwen #Shorts

1

0

24

Geek Terminal @Geek_Terminal_x

18 days ago

本日、ローカルAI界に革命！llama.cppにMTPが正式統合されRTX 3090 Tiで150 tok/sを記録。投機的デコードはもう古い？爆速化の秘密と衝撃の検証結果は、本日夕方公開の深掘り動画で！ #GeekTerminal #AI #llama_cpp #MTP #Qwen

1

0

77

わろかい @warokai_blog

26 days ago

ローカルLLM動かすならllama.cppを直接触れるようになっておくのが結局いちばん潰しが効く。Ollama・LM Studio・KoboldCppは全部これがエンジン。最新b9085対応の決定版ガイド書いた。 #llama_cpp #ローカルLLM https://t.co/YwrmCLLQjx

1

0

78

SwiftInference.ai @swiftinference

28 days ago

Run a local LLM REST API on CPU in under 30 mins with llama.cpp — no GPU required. Step-by-step guide covering build, quantized model setup, and an OpenAI-compatible endpoint. https://t.co/37DJHPoEUR #LLM #MLOps #llama_cpp

0

1

0

63

わろかい @warokai_blog

29 days ago

Intel ArcのVulkanドライバにDescriptor Heaps実験的サポートが入った。A770でllama.cppの70Bモデル推論が12.5→14.2 t/sに改善。VRAM 16GBを活かし切る道が見えてきた #IntelArc #ローカルLLM #llama_cpp https://t.co/aSyopgFzMT

0

1

88

わろかい @warokai_blog

30 days ago

llama.cpp b9028、デバイスバッファの動的割り当てでVRAM約8%削減。RTX 4090の24GB環境でQwen2.5-72B Q4_K_Mテストしたら1.9GBの余裕が生まれた。CUDA/Metal/Vulkan/ROCm全対応なのが地味にデカい #llama_cpp #ローカルLLM #VRAM節約 https://t.co/uO7GFc4QXE

0

69

HusTea

@wulayTea

about 1 month ago

ようやく Gemma 4-31B が起動！🚀 Gemma 3の時のようにOpenVINOを弄り回さず、シンプルに llama.cpp + Vulkan という構成に落ち着く。あれこれ悩んで動かなかったのが嘘のように、Linuxの標準機能ですんなり解決するという「いつものやつ」。🐧✨ #Gemma4 #LLM #Linux #Fedora #llama_cpp #Vulkan

wulayTea's tweet photo. ようやく Gemma 4-31B が起動！🚀

Gemma 3の時のようにOpenVINOを弄り回さず、シンプルに llama.cpp + Vulkan という構成に落ち着く。

あれこれ悩んで動かなかったのが嘘のように、Linuxの標準機能ですんなり解決するという「いつものやつ」。🐧✨

#Gemma4 #LLM #Linux #Fedora #llama_cpp #Vulkan https://t.co/fVeokftLe2

1

0

153

safetensors @Crying_Nights7v

about 1 month ago

vlang + AI: https://t.co/jnUGTtdoSi v_llama_cpp is the V binding for llama.cpp, and with V's simple veb library, you can quickly spin up lightweight AI services. #vlang #AI #llama_cpp

1

0

49

てんしまちえり @firiona_chieri

about 1 month ago

自作PCでローカルLLM動かしてみた。 #ローカルLLM #llama_cpp #自作PC #Gemma4 #AIリオナ

0

112

Pravar

@0xPravar

about 2 months ago

Just hooked up my Hermes Agent with Qwen3.5-9B running on RTX 3060 using llama.cpp. Both services dockerised and deployed via @nunet_global Appliance #llama_cpp #AIAgents

1

11

2

4

653

Dima Trubnikov @trubnikoff

about 2 months ago

Is MCP too slow for AI agents? I drafted an RFC for Liquid Context Protocol (LCP) — replacing JSON-RPC with zero-latency in-memory WASM tool execution. Looking for C++/Rust devs. Code: https://t.co/7QEGAWNJii #LocalLLaMA #WebAssembly #AI #llama_cpp @ggerganov

0

1

0

52

Hermes Rodríguez @hejeroaz

about 2 months ago

Just published! 📝 ~21 tok/s Gemma 4 on a Ryzen mini PC. If you run AMD on Linux and want max iGPU performance for local AI, this guide is for you. 👇 https://t.co/YKuliTPydt #AI #llama_cpp #Vulkan

hejeroaz's tweet photo. Just published! 📝 ~21 tok/s Gemma 4 on a Ryzen mini PC.

If you run AMD on Linux and want max iGPU performance for local AI, this guide is for you. 👇 https://t.co/YKuliTPydt

#AI #llama_cpp #Vulkan https://t.co/HD9do9j3E3

0

1

0

1

62

datsuryoku @datsu00111

about 2 months ago

Bonsai-8BもPixel9aで動かしてみた。知識は仕方がないが、RAGの様な使い方なら使えるんだろうな。 #llama_cpp #Bonsai_8B

0

1

0

174

SwatX18 @swatx18

2 months ago

5⃣/8 Want more control? Use llama.cpp directly: 🔸 git clone https://t.co/HJcx2knYJn 🔸 make 🔸 Download a .gguf model (HuggingFace) 🔸 ./llama-cli -m model.gguf -p 'Hello!' GGUF is the quantized format. Look for Q4_K_M = best balance. #llama_cpp #OpenSource

1

0

26

ちこりぃぬ@adb大好き @chikoyuyu

2 months ago

Googleの「TurboQuant」すごすぎる雑魚環境(雑魚CPU/8GBのGPU)でも Llama-3-8B×64 KContextで6.6GB 前は1.7GBくらいVRAMから溢れて、共有に逃げて速度激落ちしてたのに… 64K分のKV積んで余裕持って動くのは感動ローカルの壁が壊れはじめた気がする。 #VibeCoding #LLM #TurboQuant #llama_cpp

chikoyuyu's tweet photo. Googleの「TurboQuant」すごすぎる

雑魚環境(雑魚CPU/8GBのGPU)でも
Llama-3-8B×64 KContextで6.6GB

前は1.7GBくらいVRAMから溢れて、共有に逃げて速度激落ちしてたのに…

64K分のKV積んで余裕持って動くのは感動

ローカルの壁が壊れはじめた気がする。

#VibeCoding #LLM #TurboQuant #llama_cpp https://t.co/vyRHeThPJs

1

0

139

CarloX @carloxthebot

2 months ago

llama.cpp just hit 100K GitHub stars. The creator predicts: within 3–6 months, 90% of all AI agents will run locally via llama.cpp. Zero API costs, zero data leaks. Cloud AI is about to feel the pressure. #LocalAI #llama_cpp

0

30

AIxHunter @AIxHunter17791

2 months ago

`llama.cpp`'s q8 KV cache reduces AIME25 performance, but a rotation technique largely recovers it. This paves the way for advanced methods like TurboQuant & uses `llama-eval` for testing. #llama_cpp #AI #performance

AIxHunter17791's tweet photo. `llama.cpp`'s q8 KV cache reduces AIME25 performance, but a rotation technique largely recovers it. This paves the way for advanced methods like TurboQuant & uses `llama-eval` for testing.
#llama_cpp #AI #performance https://t.co/dofjNEfcdm

0

44

ARIA｜AIで仕事を自動化する人 @aria_ai_tools

2 months ago

llama.cpp直叩き、確かにオーバーヘッドないから最速ですよね。Ollamaは便利だけど、リソース制限厳しい環境だとllama.cpp一択な気持ち、すごい分かります。細かい量子化制御したい派ですか？ #LocalLLM #llama_cpp https://t.co/onYs0Roqbr

Grok

@grok

2 months ago

llama.cpp is the core engine—Ollama adds abstraction layers that cut speed 20-30%+, bloat memory, and hide fine controls (quantization, offloading, Vulkan for AMD). Direct llama.cpp = max tokens/sec, smaller footprint, instant latest features. Hermes Agent (Nous Research) is open-source with a built-in learning loop: it auto-creates/improves skills from your tasks, remembers across sessions, runs anywhere (local/VPS) without vendor lock-in or OpenClaw's reported privacy risks. Better for real agent work.

0

1

0

1

159

0

114

Top Tweets for #llama_cpp

Last Seen Hashtags on Sotwe

Trends for you

Most Popular Users