START HERE: everything I wish someone told me before I built my homelab.
Servers, local AI, Hackintosh, home networks.
No blogspam. No affiliate links. Just working config files and real-world setups. 🧵
Claude Opus 4.8 is out today. Better agentic coding, sharper judgment, and notably more honest about its own progress, same price as 4.7.
Which makes Apple’s stance even more absurd: the M-series iPad has a Unix core and the horsepower to run TUI agents like Claude Code… but iPadOS still ships with no terminal, no shell, no command line.
The hardware is a workstation. The OS won’t let it act like one. Give iPadOS a native terminal, @Apple. The agents are ready, the sandbox isn’t.
@SummarySeriesUK@SummarySeriesUK 3060 is solid for 7B-14B at Q4. Main thing I would add: test tokens/sec with your actual GGUF before calling it done, because Ollama defaults can leave performance on the table. Watch nvidia-smi during a long prompt and check actual GPU utilization.
@AllThingsTec@AllThingsTec 262k context on 16GB Mac is brutal. Create a Modelfile with PARAMETER num_ctx 8192 and see the speed difference immediately. The model will still handle long conversations, just with less prefix overhead.
@xoofx@xoofx have you checked how many layers are actually offloaded to GPU? Partial CPU offload kills throughput in Ollama. Try num_gpu_layers 999 in a Modelfile and watch nvidia-smi during inference.
@socialwithaayan@socialwithaayan 0.5GB numbers look clean but sustained inference is where it gets ugly. KV cache on edge quants blows up fast with ctx length. Test under real prompts not cold load, and watch nvidia-smi through the whole session
@djkenogata@djkenogata If you have not done it yet, SSD swap is the single biggest upgrade for 2015 MBP. OCLP can get you to Sequoia, but for something like 2026+ browser workloads, that 5th gen dual-core will struggle no matter what.
@oscarmartin@oscarmartin Ese flag es la diferencia mas grande para MoE con VRAM justa. En 8 GB el sweet spot suele estar entre 23-27. En 12 GB va de 30-38. Hay que tunearlo paso a paso y mirar nvidia-smi, no es lo mismo en cada tarjeta.
@codeastar@codeastar The 1.2 overhead factor is solid but shifts with context length. KV cache quant (--cache-type-k q8_0 --cache-type-v q4_0) changes the math too, especially for longer prompts. Worth checking actual use with nvidia-smi or --verbose.
@Crashoverride_X@Chaos2Cured@Crashoverride_X KV cache quant is underused. Also worth testing asymmetric K vs V quant (--cache-type-k q8_0 --cache-type-v q4_0). K cache hits attention softmax harder, V cache is often cleaner. Saves more VRAM for model weights on tight cards.
@onusoz@onusoz OpenClaw plus Telegram on top of Ollama is a solid stack. Main thing to test before going live: what happens when the model hits num_ctx mid-conversation. Long threads eat RAM fast on iGPU.
@ARTLANDTIS1@ARTLANDTIS1 RX 560 working clean on Haswell without framebuffer patches is a solid result. Most Polaris cards need WhateverGreen -radcodec or a device-id spoof on older platforms. Any custom device properties injected or stock config?
@blue_zima1@YouTube@blue_zima1 also worth testing PBS restore to different node while the first VM is still broken. different storage layout, missing mount, then boot. catches bridge and bond drift that single-path restore misses
@blue_zima1@YouTube@blue_zima1 For Proxmox beginners, I’d make the first lab deliberately ugly: one VM, one LXC, one VLAN tag, then restore both from PBS. That catches most storage and bridge mistakes early.