@lmontoya@jerolba Exacto, es que es tan simple como tener tu inferencia local como "fuerza bruta", y luego, por ejemplo, una suscripción de 20⏠de Cursor para poder afinar, hacer diagnósticos y poder consultar a los modelos grandes.
@jerolba Lo que no comprendo es que le pongan precio similar al DGX Spark, pero sin incluir la red de alta velocidad de 200Gbps, cuando solo el hardware de red estĂĄ valorado en unos 1000-1500$.
A priori, en igualdad de condiciones, para mi es un NO-GO de libro :/
Now, countries outside the US and China need to wake up to the urgency of Sovereign AI and start building.
But building a proprietary LLM from scratch means hitting a wallâthey lack everything from crucial training data to funding and compute.
The move they need to make is to adopt a strategy like Cursor Composer.
They need to focus entirely on the post-training of open-weight models.
Look at Cursor. They took Kimi-K2.5âa model from two generations agoâand turned it into an Opus-level model purely through post-training, radically cutting costs.
I believe this post-training market is going to absolutely explode this year.
No puede depender nuestro trabajo/empresa de estas cosas.
El riesgo es altĂsimo.
Puedes ser una empresa IA-native si quieres, pero no puedes depender de otra tan claramente. Si no tienes tus propias IA s autoalojadas, estĂĄs a un movimiento asĂ del cierre.
State of Local AI #1
âââ
In lieu of Fable ban.
Hereâs the best LLMs of the week to run on your hardware.
ââ 4-8gb vram/ram 500$
- Gemma-4-qat https://t.co/UFCmLXVKed I had someone mention itâs very good for subagent stuff
ââ 8-16gb vram/ram < 1k usd
- Gemma-12B https://t.co/tc6IBTrbc3 without a doubt the smartest model of its size
ââ 16-32gb Apple/Strix halo 1-2k usd
- Diffusion Gemma26B https://t.co/mSaWPFpgXQ
- on 1x 6000 itâs eating up to 600 tok/s
- smallest smart MoE we have
- lots of world knowledge
- easy to run
ââ 32-96gb ram/vram (2-10k usd)
- nex-n2-mini https://t.co/EL1ePzwI58 builds on qwen3.6-35B and seems to do really well
- qwopus-27B https://t.co/P1gypZwufi this model topped a lot of our benchmarks at https://t.co/UfoYoOlSIk
ââ 384gb vram (10-50K usd)
- https://t.co/AZb0Gtu5P3 23B means itâs close to qwen3.6-27B per token, while also have a lot of specialisation.
- fast inference
- top open weight model on AA
ââ 768gb-1TB
- https://t.co/kWzJG2Hjen
Kimi has always been a top player here and their last model cuts speed and cost down by 30%
- great vision support
- first coder model by moonshot
âââ
Top models:
1. Qwen3.6-35B
2. Qwen3.6-27B
3. Step-3.7-Flash
4. Minimax-M3
5. Deepseek-v4-flash
âââ
Budget sweet spots:
#1 - 1K usd
Single 3090 / Mac mini / Intel arc b70 / AMD
- Qwen / Gemma
#2 - 5k usd
DGX Spark / Mac m5 max / 4x 3090
- qwen / Gemma step and deepseek flash
#3 - 12k usd
RTX Pro 6000 / Mac Ultra / 2x Spark / 8x 3090
Ds4-flash / step-3.7-Flash and above
#4 - 24k usd
2x 6000 / 2x Mac Ultra / 4x Spark / Mix
Same as above
#5 - 50k usd
4x 6000 / 4x Max Ultra / 12x Spark / 2 H100
Minimax-m3 / nex-n2-pro / step-3.7-flash
#6 - 100k usd
GB300 station / 8x 6000 / 4x H200 / Mix
GLM-5.2 / Kimi-K2.7
âââ
Letâs keep the Internet free thanks for reading
@root_rat Yo ya llevo tiempo animando a todo el mundo a aprender con lo que tenga. Afortunadamente ya existen modelos pequeños pero matones que se pueden ejecutar en måquinas modestas.
Y ademĂĄs muchĂsimas tareas no necesitan un modelo gigante para resolverse.
This is the way!
El BCE congela tipos al 2,25%.
ÂżSabes cuĂĄnto te cuesta tu deuda bancaria este mes? No "mĂĄs o menos". El nĂșmero exacto.
La mayorĂa de las pymes no lo tienen claro. Y eso tiene un coste concreto cuando negocian con el banco.
#BCE#pymes#gestionfinanciera
day 0 support on sparkrun:
sparkrun update
sparkrun run @eugr/diffusion-gemma-bf16-thinking sparkrun run @eugr/diffusion-gemma-nvfp4-thinking sparkrun run @eugr/diffusion-gemma-nvfp4
Check sparkrun list for other options
@UnslothAI day 0 support on sparkrun:
sparkrun update
sparkrun run @eugr/diffusion-gemma-bf16-thinking sparkrun run @eugr/diffusion-gemma-nvfp4-thinking sparkrun run @eugr/diffusion-gemma-nvfp4
Check sparkrun list for other options
Da igual si te llaman Tech Lead, Engineering Manager, Senior Engineer o cualquier otra cosa.
Si te toca tomar decisiones, ayudar a otros o liderar iniciativas, este meetup es para ti.
đŹ Charla prĂĄctica
âPreguntas
đ» Networking
ÂĄNos vemos mañana en #FullstackSevilla! đ
I'm finally reading Dune. This quote, which is in the first few pages, hits hard:
"Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them."
llama.cpp is likely the first LLM runtime in the world to allow "interrupt" reasoning without stopping the whole response.
We also added a small "skip" button on the Web UI, the model gives the final response as soon as you click the button.
The response is no longer bound to reasoning budget!