llama.cpp now has an official website: https://t.co/vztdUpdBWL
Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line cross-platform installer. The installation provides a single unified `llama` entrypoint which you can use to run/serve models and interface with 3rd-party agentic applications.
While oriented towards simplified user experience, the new `llama` application also provides all the advanced functionality of the existing llama.cpp tooling with which experienced users are already familiar. Also note that all GGUF models that you might have already downloaded with llama.cpp in the past will be automatically available to use without downloading again (they are stored in the common HF cache on your machine).
We have many improvements in the pipeline both at the UX and at the engine level and we plan to iteratively ship new things over the coming months. One of the main focuses will be seamless integration with local-friendly 3rd-party agents (such as Pi). In the meantime, we’ll continue to listen for feedback from the community and adjust accordingly, so keep letting us know what you think and need.
For the DGX Spark owners. This is what you get with DS4 in your hardware. I want to post this to show how with fast prefill and not very fast generation, the system remains absolutely fine to use.
@NoobFunctor I've been using it for over 15 years and have never encountered any problems with propagation. There are many other parts that are more problematic, though.
llama.cpp adds MTP for the Qwen3.6 family
This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.
Special thanks to Aman Gupta for leading this development!
https://t.co/vjaMwEpIaR
I just pushed a big refactoring of DS4 backends with CUDA support and single direction activation steering. The Metal path should be unaffected. Note: I only support hardware I have own (or have full access to): so just M3 (no M5 NE for now), DGX Spark.