Carlo @CarloFCesar - Twitter Profile

I calculated what I’d need more or less to go full local with my project and I’d need 2 @NVIDIAAI DGX sparks 🤑💰 For now 24GB ram have to do haha! But I also believe local models will become smaller, better and perhaps even a commodity. The apps we built and top (output) and the energy that goes in (input) might become the true value!

0

19

Who to follow

Ernst Boekhorst - Directeur Topsport Metropool

@ErnstBoekhorst

Topsport Metropool Den Haag, Dordrecht en Rotterdam faciliteert samen met de Gemeenten en NOC-NSF de topsport in deze regio

Sports Leadership Institute

@sportslins

Fostering critical and creative thinking for the next generation sports leaders. Leave your mark upon the world!

about 1 month ago

@AlexFinn Went there from Openclaw, started my project from scratch. First 48h left a great impression!

0

1

0

386

Carlo

@CarloFCesar

2 months ago

@elroyic @AlexFinn Why not 70B, could quantize FP8 or NVFP4 even. And then, how does QWEN relate to for example Llama?

0

28

Carlo

@CarloFCesar

2 months ago

@luminousmind_co @AlexFinn That's a great addition, thanks @luminousmind_co

0

24

Carlo

@CarloFCesar

2 months ago

@E_Jellerson @CardilloSamuel @VadimStrizheus Yup… lessons learned

0

13

Carlo

@CarloFCesar

2 months ago

What would you run if your have 2x DGX Spark (and 1x mac mini 24gb…)and want to go 100% local? Multi agent system that needs to 24/7 run computations and cross-references on an ever growing vectorized data pool. Nemoclaw strongly preferred. - Yes I’m relatively new to it - Yes I first got the mac mini and learned a lot (hence the upgrade) - Yes I need some help here. 😂

0

249

Carlo

@CarloFCesar

3 months ago

If you don’t read up on the updates on AI for more than 24hours you feel sooooo behind

0

1

0

37

Carlo

@CarloFCesar

3 months ago

My lesson for today: ''Sleeping turns knowledge into wisdoms'' Not a 1 on 1 quote, but it's the concept. Comes from: https://t.co/AI1sAKXQd2

0

1

0

49

Carlo

@CarloFCesar

3 months ago

@sidelined_cap Agree there man! I use Qwen 3.6 via openrouter only to do online peer reviewed research for me

0

1

0

86

Carlo

@CarloFCesar

3 months ago

@witcheer @AlexFinn I just added Qwen3.6-plus-preview for non sensitive prompts and cross (research pulls in my case), previously I ran these on Haiku or Sonnet. Moving on! https://t.co/qs12WaN7lx

0

100

Carlo

@CarloFCesar

3 months ago

Just cut my agent's cloud costs significantly without sacrificing quality. Thanks to @witcheer and @AlexFinn, I starting building from your posts! The journey: Started on Mac mini M4 (24GB…) running Claude Haiku/Sonnet for all background tasks ~60 API calls/day. Tried Qwen3.5-35B-A3B via Ollama first. 23GB model. OOM. Killed. Tried Qwen2.5-14B. Fit fine, but reasoning quality too weak for my workflow. Then TurboQuant dropped @GoogleResearch. KV cache compression, 4.6× smaller memory footprint! My new stack: Qwen3.5-27B-IQ3_XXS (10.7GB) via llama.cpp TurboQuant fork → 13.6 tok/s on M4 Pro → zero API cost. Validation before shipping: → 15 tool-use scenarios: 12/12 pass → Shadow test against Haiku on live data: output indistinguishable → Did have 3 timeout failures on error recovery — but a lot of crons can run overnight with a higher timeout, not a quality issue My full model stack today: 🟢 Local (free) → Qwen3.5-27B IQ3_XXS — health monitoring, training load, environmental logging. (Based on multimodal biomarkers, keen on getting that data vectorized!) → MedGemma 4B — offline domain-specific model 🔵 Anthropic → Claude Sonnet — interactive sessions, memory synthesis → Claude Haiku — research briefings, clinical alerts, complex reasoning chains 🟡 Google → Gemini 2.5 Pro — fallback when Anthropic unavailable ⚫ DeepSeek → DeepSeek V3.2 — cost-efficient tasks when applicable The "local = cheap but dumb" assumption is breaking down fast fast faster! Probably it is already outdated, goes so fast! Curious to get it smarter and cheaper everyday 🥳

1

0

1

325

Carlo

@CarloFCesar

3 months ago

@VadimStrizheus Ok. This already changed.... just added Qwen3.6-plus-preview (free tier). Nothing sensitive or private. I runs my research crons and prompts through it now instead of Sonnet 4.6 https://t.co/qs12WaN7lx Things move fassssst

0

112

Carlo

@CarloFCesar

3 months ago

@VadimStrizheus I’d do Qwen 27, not 35 to be honest. With 35 you won’t have enough headspace. And run turboquant on top to compress kv chache. Then make sure to have some cloud api for when you really need it

2

4

0

4

2K

Carlo

@CarloFCesar

3 months ago

Things really do go fast! Just set all my online research cross and prompts to run on Qwen3.6 Plus Preview (free tier) on OpenRouter. No personal data, no MEMORY.md, no strategy, nothing sensitive because Alibaba collects prompts on the free tier. Check it out here: https://t.co/0lfGTovTtw

Carlo

@CarloFCesar

3 months ago

Just cut my agent's cloud costs significantly without sacrificing quality. Thanks to @witcheer and @AlexFinn, I starting building from your posts! The journey: Started on Mac mini M4 (24GB…) running Claude Haiku/Sonnet for all background tasks ~60 API calls/day. Tried Qwen3.5-35B-A3B via Ollama first. 23GB model. OOM. Killed. Tried Qwen2.5-14B. Fit fine, but reasoning quality too weak for my workflow. Then TurboQuant dropped @GoogleResearch. KV cache compression, 4.6× smaller memory footprint! My new stack: Qwen3.5-27B-IQ3_XXS (10.7GB) via llama.cpp TurboQuant fork → 13.6 tok/s on M4 Pro → zero API cost. Validation before shipping: → 15 tool-use scenarios: 12/12 pass → Shadow test against Haiku on live data: output indistinguishable → Did have 3 timeout failures on error recovery — but a lot of crons can run overnight with a higher timeout, not a quality issue My full model stack today: 🟢 Local (free) → Qwen3.5-27B IQ3_XXS — health monitoring, training load, environmental logging. (Based on multimodal biomarkers, keen on getting that data vectorized!) → MedGemma 4B — offline domain-specific model 🔵 Anthropic → Claude Sonnet — interactive sessions, memory synthesis → Claude Haiku — research briefings, clinical alerts, complex reasoning chains 🟡 Google → Gemini 2.5 Pro — fallback when Anthropic unavailable ⚫ DeepSeek → DeepSeek V3.2 — cost-efficient tasks when applicable The "local = cheap but dumb" assumption is breaking down fast fast faster! Probably it is already outdated, goes so fast! Curious to get it smarter and cheaper everyday 🥳

1

0

1

325

0

1

0

138

Carlo

@CarloFCesar

3 months ago

@no_stp_on_snek @VadimStrizheus Exactly this ye!

0

1

0

121

Carlo

@CarloFCesar

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users