Marcos Fernandez @platformalchemy - Twitter Profile

Marcos Fernandez

@platformalchemy

5 days ago

@steipete @jamesingram9999 So true, I am suddenly a very prolific Go-Lang coder haha 🤣

0

17

Marcos Fernandez

@platformalchemy

5 days ago

Oh yea... now if you can mirror the production deployment and do live endpoint testings... db schema upgrades and retain functionality without regressing... you can ensure a full end to end production delivery pipeline... just make sure to figure out for ci/cd systemt and rely a little as possible on external systems. You'll start getteing throttled everywhere. Github doesn't like this. Already moved entirely selfhost. Nothing can handle my throughput without getting rate limited or throttled.

0

1

0

1

484

Marcos Fernandez

@platformalchemy

7 days ago

@steipete If they are taking 4 to 10 hrs... aren't they more token intensive? More expensive? What's most cost effective? Rocketfuel isn't cheap.

0

129

Marcos Fernandez

@platformalchemy

14 days ago

@steipete I have been using a headless playwright docker skill, playwright already has a docker image with all tools in it.

1

0

1

1K

Who to follow

15 days ago

Let's see how this works out.

0

10

Marcos Fernandez

@platformalchemy

about 1 month ago

Preliminary, this would be quite the speed outdoing at least 2x over qwen3.6 on an rtx5090

0

52

Marcos Fernandez

@platformalchemy

about 1 month ago

checking out this claim... before my findings were that Qwen3.6-35b-a3b was faster.

Google for Developers

@googledevs

about 1 month ago

Gemma 4: Now up to 3x Faster. ⚡ Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.

167

6K

627

2K

835K

1

0

89

Marcos Fernandez

@platformalchemy

about 1 month ago

@sojoodi @escander007 @steipete yes sir, I just switches the difference is definitely tangible and I mean... all the billing drama, they are crashing out from my POV.

0

1

0

31

Marcos Fernandez

@platformalchemy

about 1 month ago

@Orion_Maximus I've cancelled all Anthropic subscriptions. I felt I was incorrectly billed, but I mean my feeling and my ability to verify are two separate things. Therefore, it's just a feeling, not true smoking gun.

0

5

Marcos Fernandez

@platformalchemy

about 1 month ago

Check your Anthropic Billing Settings and Turn-Off or Limit Extra Usage.

1

0

36

Marcos Fernandez

@platformalchemy

about 1 month ago

@mikeassad77 @AlexFinn Right, same here. Not sure why gemma4 gets so much praise. Qwen3.6-35b-a3b runs faster than gemma4 has better kv cache compression with turboquant

0

1

0

98

Marcos Fernandez

@platformalchemy

about 1 month ago

@Esongsofficial @AlexFinn @kilocode There are no vibes anymore in his stack. It is all self planning and autonomous. Like my stack. Been on and off, it is just tough to keep up.

0

1

0

8

Marcos Fernandez

@platformalchemy

about 1 month ago

@Megalion_ent @AlexFinn Same question. I get better performance out of qwen3.6-35b-a3b.

1

0

45

Marcos Fernandez

@platformalchemy

about 1 month ago

@tipofthespear78 @AlexFinn Use MoE version. Also depends on what mac mini. Memory bandwidth matters. I am switching to rtx5090 much faster than the minis. Much more expensive tho. But it is faster than the m5max.

0

1

0

83

Marcos Fernandez

@platformalchemy

about 1 month ago

@AlexFinn @AlexFinn Gemma4 over Qwen3.6? I keep getting better performance on the Qwen3.6. How does Gemma4 win to Qwen3.6? Can't justify it. Too expensive to run Gemma4 in the 5090. Gemma4 3 lanes vs Qwen3.6 4 lanes w/turbo quant.

0

1

0

1

630

Marcos Fernandez

@platformalchemy

about 1 month ago

@theo It's time to move on. I found myself very surprised with open source models lately. To the point I don't even miss opus, nor sonnet nor haiku. And I am saving money now.

3

30

1

6K

Marcos Fernandez

@platformalchemy

about 1 month ago

@steipete @steipete did you manage with github actions? or openclaw bot doing the work? I tried for like 3 months with github actions and gave up. It worked for a little bit, but then it'd be too unreliable. Now I am leaning towards just openclaw working on it directly in the repo.

0

60

Marcos Fernandez

@platformalchemy

about 1 month ago

@steipete @C8Luna 💨

0

98

platformalchemy retweeted

Graeme

@gkisokay

about 2 months ago

The Local LLM Cheat Sheet for your 32GB RAM device I was asked to put together a practical lineup of local models that fit comfortably on a 32GB machine. At this tier, you start getting access to real flagship-class local models, plus a growing number of custom quants. But for most people, these are the core models worth knowing first. Flagship Models Qwen3.5 27B / GGUF / Q6_K_M The best overall 32GB flagship. General chat, writing, research, and agent workflows. Great if you want one model that can handle almost everything well. Qwen3.6-35B-A3B / GGUF / UD-Q4_K_M Best MoE flagship. Stronger for coding, reasoning, and tool use than most smaller generalists. Gemma 4 31B / GGUF / Q6_K_M Dense premium model. Writing, analysis, reasoning, and high-end local chat. Heavier than the MoE options, but excellent when quality matters more than speed. Models for Fast Flagship Use Gemma 4 26B A4B / GGUF / Q6_K_M Great balance of speed and quality for general assistant work, coding, agent tasks, and research. This is one of the best 32GB picks if you want something that feels high-end without dragging. DeepSeek-R1 Distill Qwen 32B / GGUF / Q4_K_M Offline reasoning engine. Best for math, logic, deliberate analysis, and step-by-step problem solving. Mistral Small 24B / GGUF / Q6_K_M Tool-calling specialist. Strong for assistants, chat workflows, local business tasks, and function calling. Available for 24GB machines. Models for Companion Use Qwen3.5 9B / GGUF / Q6_K_M Best sidekick. Fast drafts, search loops, cheap retries, and secondary agent work. Even on a 32GB machine, you still want a smaller model around for support tasks. Llama 3.1 8B / GGUF / Q6_K_M Long-context companion. RAG, doc ingestion, codebase chat, and long prompts. The output quality is not the sharpest anymore, but it is still useful when needing simple tasks fast. From what my community tells me, the best single models are Qwen3.5 27B or Gemma 4 31B. For two models, the strongest general pairing is Qwen3.5 27B + Qwen3.5 9B. If you are more code-heavy, Qwen3.6-35B-A3B + Llama 3.1 8B. Let me know what models you are running on 32GB, and which ones have actually been worth the RAM.

gkisokay's tweet photo. The Local LLM Cheat Sheet for your 32GB RAM device

I was asked to put together a practical lineup of local models that fit comfortably on a 32GB machine.

At this tier, you start getting access to real flagship-class local models, plus a growing number of custom quants. But for most people, these are the core models worth knowing first.

Flagship Models

Qwen3.5 27B / GGUF / Q6_K_M
The best overall 32GB flagship. General chat, writing, research, and agent workflows. Great if you want one model that can handle almost everything well.

Qwen3.6-35B-A3B / GGUF / UD-Q4_K_M
Best MoE flagship. Stronger for coding, reasoning, and tool use than most smaller generalists.

Gemma 4 31B / GGUF / Q6_K_M
Dense premium model. Writing, analysis, reasoning, and high-end local chat. Heavier than the MoE options, but excellent when quality matters more than speed.

Models for Fast Flagship Use

Gemma 4 26B A4B / GGUF / Q6_K_M
Great balance of speed and quality for general assistant work, coding, agent tasks, and research. This is one of the best 32GB picks if you want something that feels high-end without dragging.

DeepSeek-R1 Distill Qwen 32B / GGUF / Q4_K_M
Offline reasoning engine. Best for math, logic, deliberate analysis, and step-by-step problem solving.

Mistral Small 24B / GGUF / Q6_K_M
Tool-calling specialist. Strong for assistants, chat workflows, local business tasks, and function calling. Available for 24GB machines.

Models for Companion Use

Qwen3.5 9B / GGUF / Q6_K_M
Best sidekick. Fast drafts, search loops, cheap retries, and secondary agent work. Even on a 32GB machine, you still want a smaller model around for support tasks.

Llama 3.1 8B / GGUF / Q6_K_M
Long-context companion. RAG, doc ingestion, codebase chat, and long prompts. The output quality is not the sharpest anymore, but it is still useful when needing simple tasks fast.

From what my community tells me, the best single models are Qwen3.5 27B or Gemma 4 31B.

For two models, the strongest general pairing is Qwen3.5 27B + Qwen3.5 9B.

If you are more code-heavy, Qwen3.6-35B-A3B + Llama 3.1 8B.

Let me know what models you are running on 32GB, and which ones have actually been worth the RAM.

84

2K

347

3K

311K

Marcos Fernandez

@platformalchemy

about 2 months ago

@steipete Need to get traffic replays working instead of doing live API calls. Plus, likely cheaper than inference cost or api token usage. On a ci/cd system and every build running api key tests. That's a lot of api calls

0

3K

Marcos Fernandez

@platformalchemy

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users