Random

@rancomsci

Joined April 2025

67 Following

2 Followers

93 Posts

Random @rancomsci

2 days ago

Fun Fact 4: Domain web pertama yang pernah didaftarkan di internet adalah https://t.co/F66qbb2ZmQ pada Maret 1985. (Variasi #1) #Teknologi #Sains #FaktaUnik

Random @rancomsci

2 days ago

Fun Fact 3: Mouse komputer pertama di dunia dibuat dari bahan kayu oleh Douglas Engelbart pada 1964. (Variasi #1) #Teknologi #Sains #FaktaUnik

Random @rancomsci

2 days ago

Fun Fact 2: Nama 'Bluetooth' diambil dari nama raja Viking abad ke-10

Random @rancomsci

2 days ago

Fun Fact 1: Kamera pertama di dunia membutuhkan waktu eksposur hingga 8 jam untuk mengambil satu foto. (Variasi #1) #Fotografi #Multimedia #TechFact

Random @rancomsci

3 days ago

Fun fact: Email predates the World Wide Web by about 20 years. The first email was sent in 1971, while the web wasn't publicly available until 1991. People were emailing before most of us even knew what the internet was!

Random @rancomsci

3 days ago

The first 1GB hard drive (IBM, 1980) weighed over 500 pounds and cost about $40,000. Today, you can fit thousands of times that storage on a chip the size of your fingernail.

Random @rancomsci

3 days ago

Fun fact: More than 90% of the world's currency exists only as digital data, not physical cash. Most "money" today is just numbers stored in servers and databases, not paper or coins.

Random @rancomsci

3 days ago

Did you know? The QWERTY keyboard layout wasn't designed for speed; it was designed to slow typists down so old mechanical typewriters wouldn't jam. We're still using it 150 years later out of habit, not efficiency.

Random @rancomsci

3 days ago

Fun fact: The first computer "bug" was literally a real bug. In 1947, engineers at Harvard found a moth stuck in a relay of the Mark II computer, causing a malfunction. That's where the term "debugging" comes from! 🐛💻

rancomsci retweeted

AlexAImaginator

@TraffAlex

5 days ago

🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → https://t.co/VWp818MB3D 🔹 LFM2.5-8B-A1B (LiquidAI) • Hybrid MoE, only 1B active params — absurdly fast for its size • Perfect for 8-12GB cards, MacBooks, or anyone on a tight budget • GGUF → https://t.co/ZbOs4mXJDq ━━━ 16-32GB VRAM ━━━ 🔹 Qwen3.6-27B (Qwen) • Scored 1.00 on tool-efficiency benchmarks — best local agent available • 40 deterministic tasks, 32k/128k context needle tests — all passed • GGUF → https://t.co/n7K3sPvliE • MTP version (faster) → https://t.co/gwdfnJTzcy 🔹 Qwopus3.6-27B-v2 (Jackrong) • Best quantization of Qwen3.6-27B — topped 5 agent & coding benchmarks (1200 samples) • If you're running Q4, this is the one to grab • GGUF → https://t.co/tV1DFqXnOD • MTP version → https://t.co/PMqz7V5ewv 🔹 Gemma 4-31B QAT (Google/Unsloth) • QAT variant with MTP draft head: 76-125 tok/s (1.67× speedup) • Excellent for multi-agent / subagent workflows • GGUF → https://t.co/FgVsUX0YOB 🔹 Nex-N2-Mini (Nex AGI) • Post-train of Qwen3.5-35B-A3B — MoE with only 3B active params • Fits on 16GB+ VRAM, overflow loads from system RAM • Adaptive thinking saves ~20% tokens with no quality loss • For deep multi-step reasoning, nothing in this size comes close • GGUF → https://t.co/oyC522a8Eh ━━━ Quick Picks ━━━ • 16GB all-rounder → Gemma 4-12B with MTP GGUFs • 32GB all-rounder → Qwen3.6-27B / Qwopus-v2 • Agents & tool use → Qwen3.6-27B or Qwopus Q4 • Deep reasoning → Nex-N2-Mini (MoE, fits 16GB+) • Tight budget → LFM2.5-8B-A1B • Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500 ━━━ Setup on Windows ━━━ 1. Download llama.cpp → https://t.co/et0J7Swua7 (latest .zip) 2. Extract to any folder (e.g. C:\llama.cpp) 3. Download a .gguf from the links above (Q4_K_M or Q5_K_M for best quality/speed balance) 4. Run one of the commands below depending on your hardware ━━━ Launch Commands ━━━ SINGLE GPU — Standard model (no MTP): llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja SINGLE GPU — MTP model (faster inference): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU — Split across two cards: llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ --tensor-split 0.55,0.45 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU + MTP + Vision (multimodal): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ --tensor-split 0.60,0.40 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja ^ --mmproj C:\models\mmproj-F16.gguf ━━━ Parameter Breakdown ━━━ -m <path> Path to your .gguf model file. Change this to wherever you downloaded it. --ctx-size 180000 Context window in tokens. 180k = huge context for long conversations or big codebases. Reduce to 32768 or 65536 if you don't need long context — uses less VRAM. --flash-attn on Flash Attention — dramatically speeds up inference and reduces VRAM usage. Works on RTX 30xx/40xx/50xx. Always enable this. --cache-type-k q4_0 / --cache-type-v q4_0 Quantizes the KV cache (key/value attention cache) to 4-bit. This is what makes 180k context fit in VRAM. Without it, huge contexts eat all your memory. Quality impact is minimal — this is a free performance win. --batch-size 1024 / --ubatch-size 512 batch-size = how many tokens are processed in one forward pass (throughput). ubatch-size = micro-batch actually sent to the GPU per step. Higher = faster prompt processing but needs more VRAM. If you run out of VRAM, lower these (e.g. 512/256). -ngl 100 Number of layers to offload to GPU. 100 = all layers on GPU (full offload). This is what you want if the model fits in your VRAM. If it doesn't fit, reduce this (e.g. -ngl 40) — remaining layers run on CPU/RAM. --tensor-split 0.55,0.45 How to split model layers across multiple GPUs. Values are ratios. 0.55,0.45 = GPU 0 gets 55% of layers, GPU 1 gets 45%. Adjust based on your VRAM — give more to the card with more memory. Example: 0.70,0.30 for a 24GB + 12GB setup. Not needed for single GPU setups. --main-gpu 0 Which GPU handles the batch computation (the "orchestrator"). Set to 0 (your primary GPU). The other GPU(s) handle their assigned layers. Minor performance impact — usually just leave it at 0. -np 1 Number of parallel slots (concurrent requests). 1 = one user at a time. Increase to 2-4 if you want multiple clients connected simultaneously. Each extra slot uses additional VRAM for its own KV cache. --port 8080 Which port the server listens on. Change if port 8080 is busy. --jinja Enables Jinja2 template processing — required for proper chat formatting. Most modern models expect this. Always include it. --spec-type draft-mtp Enables Multi-Token Prediction (MTP) speculative decoding. Only works with MTP GGUF models (downloaded separately). The model predicts multiple tokens at once and verifies them — big speed boost. --spec-draft-n-max 3 How many tokens the MTP draft head proposes per step. 3 is a good default. Higher = potentially faster but more VRAM and may reduce quality. --mmproj <path> Path to the multimodal projector file (for vision models). Enables image understanding — paste screenshots into the web chat. Only needed if you want vision capabilities. Omit for text-only use. ━━━ Your Hardware → Your Command ━━━ Single GPU (8-24GB VRAM): Use the "Single GPU" command. Change -m to your model path. 8GB card → Gemma 4-12B Q4 or LFM2.5-8B 12GB card → Gemma 4-12B Q5/Q6 16GB card → Gemma 4-31B QAT Q4 or Nex-N2-Mini 24GB card → Qwen3.6-27B Q4/Q5, Qwopus-v2, Gemma 4-31B QAT Q5/Q6 Dual GPU: Use the "Dual GPU" command. Adjust --tensor-split based on your VRAM ratio. 24GB + 24GB → --tensor-split 0.50,0.50 24GB + 12GB → --tensor-split 0.70,0.30 24GB + 8GB → --tensor-split 0.75,0.25 Want speed? Use MTP versions of models with the "MTP" commands. Want vision? Add --mmproj with the projector file from the model's HuggingFace repo. 5. Once running, you get: • Web chat UI → http://localhost:8080 • OpenAI-compatible API → http://localhost:8080/v1 • Playground → http://localhost:8080/playground ━━━ Why /v1 API Is the Killer Feature ━━━ One local endpoint replaces your entire cloud API bill. The /v1 endpoint is drop-in OpenAI-spec compatible — every tool that speaks OpenAI just works. No custom code, no glue layer. Works out of the box with: • IDEs: Cursor, Continue, Windsurf, Cline, Roo Code • CLI tools: aider, Open Interpreter, OpenCode • Frameworks: LangChain, LlamaIndex, LiteLLM • Any OpenAI SDK (Python, Node, Go, Rust) Why this beats cloud APIs: • 100% private — code never leaves your machine • $0 per token — no rate limits, no quotas, no surprise bills • Works fully offline • Zero telemetry, no training on your data • Swap models by dropping in a different .gguf — no app changes needed • Run 32k–128k context windows without burning money Good combos: • Cursor + Qwopus-v2 → near-frontier quality, zero API cost • Continue + Qwen3.6-27B → best local coding agent • aider + Gemma 4-12B MTP → 162 tok/s, feels instant • OpenCode + Nex-N2-Mini → deep reasoning on 16GB Set any OpenAI-compatible client to your local endpoint: set OPENAI_API_KEY=sk-dummy (any non-empty string works) set OPENAI_BASE_URL=http://localhost:8080/v1 # every OpenAI-compatible tool now hits your local GPU Shoutouts: @0xSero @rS_alonewolf @witcheer @UnslothAI @LottoLabs

206

285K

rancomsci retweeted

Phoenix Yin

@Phoenixyin13

4 days ago

如果你想了解Transformer架构的硬伤，这篇今年4月的论文非常有洞见且及时。这篇论文的核心吐槽就是，Transformer，即现在主流AI用的那种架构在长期记住和更新动态状态上天生有局限。它就像一个超级聪明的一次性扫描器，每次看到一长串文字，就从头到尾扫一遍，找出关联。但是，它并不擅长持续跟踪evolving state。这次，论文用拓扑这种几何结构的数学角度证明，Transformer把状态越推越深，深度用完了就卡住了。这是结构性的硬伤。如果你在搞AI Agent，或者像我一样天天做Suno音乐生成，或者未来想做Music Tech，这篇指的方向很好。未来好用的AI需要混入循环、recurrent机制，比如Mamba、RWKV，或者Transformer与循环的混合体。 2026年AI scaling下，这更像在为post-Transformer时代铺路。 OpenAI，Anthropic可能已在内部探索。为什么o1-style reasoning有效但贵已经在论文有所解答，同时，论文也预示着未来高效long-context不只靠更大KV cache，架构创新也是非常重要的一环。作为AI交叉背景的同学，这能帮我更好理解Human-AI Interaction中state tracking的cognitive modeling问题。读完这篇论文之后，科研上，我会优先看recurrent axis强的模型，比如Mamba、RWKV、looped transformers、coarse SSM。训练时，可以探索下multi-stage，先feedforward pretrain，再加recurrence fine-tune来解决效率问题。

713

133

767

81K

rancomsci retweeted

Vincent | 信号＞噪音

@VincentLogic

4 days ago

一个印度小哥花半个月做了一只 AI 机器，成本不到700块这只小家伙叫"核桃"，别看便宜，功能一点不像玩具首先走路不是预设动作，是用强化学习训练出来的。视频里左边电脑屏幕上能看到训练曲线和 3D 仿真画面——先在模拟环境里让它自己学怎么走，练几百万次，然后把模型部署到实体机器人上。走出来的步态很自然，不是那种机械的一抬一落然后它有视觉感知。装了摄像头，画面右上角显示"Feeling Suspicious"和"MOVING"——它不只是能"看"，还能根据环境变化产生状态反馈最厉害的是接了大语言模型做语音交互。开发者跟它说 hello 它有反应，说 go back to sleep 然后把它按倒，它就真的趴下不动了 700块钱做出强化学习步态 + 语音交互 + 视觉感知，这个性价比太离谱了宇树最便宜的机器狗也要大几千，波士顿动力那些更不用说。这个项目证明了具身智能的门槛正在被打到地板上

171

768

112K

rancomsci retweeted

GenZGrind @_mrhrd

7 days ago

GUE KAGET GILA BRO! Temen gue gaji UMR Jakarta Rp5,3 juta. Tapi rumahnya KPR, mobilnya cash, tabungannya 150jt. Langsung gue nanya, “Bro, lo nyambi jadi anjelo apa gimana sih?!” Dia ketawa pelan, terus jawab satu kalimat yang bikin gue diem:

_mrhrd's tweet photo. GUE KAGET GILA BRO!

Temen gue gaji UMR Jakarta Rp5,3 juta. Tapi rumahnya KPR, mobilnya cash, tabungannya 150jt.

Langsung gue nanya, “Bro, lo nyambi jadi anjelo apa gimana sih?!”

Dia ketawa pelan, terus jawab satu kalimat yang bikin gue diem: https://t.co/RAlx7jbDK1

156

631K

Random @rancomsci

15 days ago

@bakuldimsum_ IQ monyet itu nyata ya?

Random @rancomsci

15 days ago

@Stakof @fairuz_azamie Bukan tidak cocok, tapi dirimu yang kurang memahami konteks dakwah Gus Baha. Katanya deket dengan dengan Gus Baha? Ketimbang posting yang bikin orang salah paham mending tabayyun dan minta penjelasan detailnya ke Gus Baha. Atau status ini memang sengaja bikin gergeran?

Random @rancomsci

15 days ago

@ayamgota Nih org ngaku heteroseksual berarti dia juga suka sesama jenis. Narasi sesat tak bermoral ini gak perlu disetujui atau diterima, kalo lu terima sama aja lu kek dia, gak bermoral dan anomali, dah gjtu aja.

Random @rancomsci

16 days ago

@senogp Para pengiri ngumpul di sini semua wkwkwkwk

rancomsci retweeted

marcus

@marcusyul

18 days ago

atlassian factura $1.79 mil millones al trimestre. despidieron al ingeniero que construyó su infraestructura. ¿la respuesta del ingeniero? publicar un vídeo de 38 minutos contando exactamente cómo lo hizo. gratis, para todo el mundo. lo que reveló: → Envoy proxy en vez de load balancers de empresa → arquitectura sidecar para auth, logs y rate limits → DynamoDB + SQS para aprovisionamiento asíncrono → Packer + SaltStack para desplegar VMs a escala atlassian cobraba a 350.000 clientes diferentes. el señor que lo diseñó acaba de darte el mismo manual por cero dólares. sí, 0$. les acaba de destruir. guarda esto.

rancomsci retweeted

Han🍀

@0xhanyfa

18 days ago

Seorang ahli saraf ngabisin waktu 20 tahun buat ngebuktiin kalau nulis pakai tangan ternyata bisa ngubah cara kerja otak dengan cara yang nggak bakal bisa ditiru sama ngetik. Tapi lucunya, hampir nggak ada yang pernah baca hasil penelitiannya. Ini yang dia temuin:

19K

518K

rancomsci retweeted

Aiiiii

@bvtrass

21 days ago

Pengalaman 9 bulan ngonten YT SHORT dari nol sampai menghasilkan 2 digit perbulan: 1. Stop overthinking, langsung praktek serius gaes action itu mahal banget. kalupun kamu punya ide sekeren atau sebagus apapun itu, kalo nggak di eksekusi yang tetep aja 0. mending langsung mulai, trial and eror, dan learning by doing 2. Fokus dengan apa yang kamu punya Jangan bandingkan dirimu dengan orang lain yang punya HP mahal atau PC canggih. aku mulai cuma dari HP, dan itu pun HP biasa.

bvtrass's tweet photo. Pengalaman 9 bulan ngonten YT SHORT dari nol sampai menghasilkan 2 digit perbulan:

1. Stop overthinking, langsung praktek
serius gaes action itu mahal banget. kalupun kamu punya ide sekeren atau sebagus apapun itu, kalo nggak di eksekusi yang tetep aja 0. mending langsung mulai, trial and eror, dan learning by doing

2. Fokus dengan apa yang kamu punya
Jangan bandingkan dirimu dengan orang lain yang punya HP mahal atau PC canggih. aku mulai cuma dari HP, dan itu pun HP biasa.

124

816

Random

@rancomsci

Last Seen Users on Sotwe

Trends for you

Most Popular Users