dextune @djdextune - Twitter Profile

dextune

@djdextune

about 4 hours ago

Mind-blowing agent coding performance. Just how powerful is Mythos? Though I bet the pricing is just as "powerful"

0

7

dextune

@djdextune

about 4 hours ago

Freebuff is a great free vibe-coding tool. If you're new to vibe-coding and hesitant to spend money, give Freebuff a try. It gets the job done. https://t.co/w25gIqMWQG

dextune

@djdextune

8 days ago

I tried out FreeBuff, a newly released AI agent coding tool. How to install: npm install -g freebuff Getting started: Once installed, run freebuff and you'll be prompted to log in via GitHub. Currently, it supports DeepSeek V4 Flash and Mimo 2.5. It doesn't seem to be completely unlimited. Based on the message "0.2 of 5 sessions used, resets in 46m," it looks like they provide a certain amount of usage that resets after a specific period. Also, a text ad pops up at the bottom while the coding is in progress. That said, there don't seem to be any restrictions on using the available models themselves. Bottom line: Just install it and start using it! https://t.co/2xfPpgMVYf

djdextune's tweet photo. I tried out FreeBuff, a newly released AI agent coding tool.

How to install: npm install -g freebuff

Getting started: Once installed, run freebuff and you'll be prompted to log in via GitHub.

Currently, it supports DeepSeek V4 Flash and Mimo 2.5.

It doesn't seem to be completely unlimited. Based on the message "0.2 of 5 sessions used, resets in 46m," it looks like they provide a certain amount of usage that resets after a specific period. Also, a text ad pops up at the bottom while the coding is in progress.

That said, there don't seem to be any restrictions on using the available models themselves.

Bottom line: Just install it and start using it!

https://t.co/2xfPpgMVYf

0

1

0

1

252

0

1

0

26

dextune

@djdextune

about 15 hours ago

@ai_hakase_ Hello World. https://t.co/B80qFDnPCI

dextune

@djdextune

1 day ago

Gemma4 12B QAT Q4 + MTP + llama.cpp @ RTX3060 12GB llama.cpp https://t.co/tdidkUZ8SI unsloth/gemma-4-12B-it-qat-GGUF https://t.co/YKHcvYWbBl Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF https://t.co/JsQ53h1Nzr ~/llama.cpp/build/bin/llama-server \ -m ~/models/gemma4-12b/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf \ -md ~/models/gemma4-12b/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf \ -ngl 99 \ --spec-draft-ngl 99 \ -c 131072 \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --spec-draft-type-k q8_0 \ --spec-draft-type-v q8_0 \ --batch-size 2048 \ --ubatch-size 512 \ --jinja \ --spec-type draft-mtp \ --spec-draft-n-max 4 \ --temp 0.2 --top-p 0.85 --min-p 0.05 \ --host 0.0.0.0 --port 8080

1

2

1

294

0

22

dextune

@djdextune

about 15 hours ago

@PCONES It's gonna snap any second now;;

0

69

Who to follow

Phil Hardscore

@Philextreme

DJextreme | Oldskool Hardcore, Jungle, Drum & Bass. For Live, Studio & Pirate Radio mixes all with track listings check https://t.co/IcMmCvuX6d | https://t.co/zPf7840CK3

Lazaro Casanova

@lazarocasanova

Mi nombre es Lazaro. Mi planeta es Jupiter.

about 15 hours ago

@CoinSh0t amzing

0

76

dextune

@djdextune

about 15 hours ago

Yes! AI fever in Korea is actually insane. 🇰🇷

0

16

dextune

@djdextune

about 15 hours ago

@robinebers Korean devs and vibecoders are in absolute FOMO MAX mode right now. We literally feel like AI is going to delete all our jobs by tomorrow morning! 😭😂

0

13

dextune

@djdextune

about 16 hours ago

Running Gemma4 12B with a 256K context is possible on an RTX 3060 12GB, but expect a bit of a slowdown in speed. Gemma4 QAT Q4 + MTP + llama.cpp + 256K CTX @ RTX3060 12GB ~/llama.cpp/build/bin/llama-server \ -m ~/models/gemma4-12b/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf \ -md ~/models/gemma4-12b/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf \ -ngl 99 \ --spec-draft-ngl 99 \ -c 262144 \ -fa on \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --spec-draft-type-k q8_0 \ --spec-draft-type-v q8_0 \ --batch-size 2048 \ --ubatch-size 1024 \ --jinja \ --spec-type draft-mtp \ --spec-draft-n-max 4 \ --temp 0.2 --top-p 0.85 --min-p 0.05 \ --host 0.0.0.0 --port 8080

0

2

0

111

dextune

@djdextune

about 17 hours ago

@LottoLabs Need more GPU 🥲🥲

0

12

dextune

@djdextune

about 17 hours ago

Looking for the llama.cpp version that supports Gemma4 MTP? Support for Gemma4 MTP has recently been merged into llama.cpp. This means you can now run Gemma4 MTP by building it straight from the official llama.cpp repo. https://t.co/WR277k679g

0

47

dextune

@djdextune

about 20 hours ago

When running LLMs with VRAM+CPU offloading, MTP draft types matter: Embedded: Draft heads load into VRAM by default -> high risk of OOM External: Separate file, so you can adjust GPU layers independently. Much easier to offload to CPU If VRAM is tight, always go with external drafts. If using embedded, keep a massive VRAM buffer!

0

18

dextune

@djdextune

about 22 hours ago

Gemma4 26B QAT Q4 + MTP + llama.cpp @ RTX 3060 12GB You shouldn't expect much from MTP in MoE models. Still, since it's out, I gave it a spin. Based on the 3060 12GB, the VRAM was insufficient, so I did CPU offloading. Compared to previous tests, it seems to come out about 15-20% more. The files I received are as follows. The driving command is listed below. If VRAM is insufficient, increase the -ncmoe value, and if it is sufficient, decrease or remove it. Let's just do it for now. gemma-4-26B_q4_0-it.gguf https://t.co/lo9cINqpHk gemma-4-26b-A4B-it-assistant-Q4_0-q4emb.gguf https://t.co/LVl8XL5VA4 ~/llama.cpp/build/bin/llama-server \ -m ~/models/gemma4-26b/gemma-4-26B_q4_0-it.gguf \ -md ~/models/gemma4-26b/gemma-4-26b-A4B-it-assistant-Q4_0-q4emb.gguf \ --host 0.0.0.0 \ --port 8080 \ -c 131072 \ -ncmoe 13 \ -fa on \ -ctk q8_0 \ -ctv q8_0 \ -b 1024 \ -ub 256 \ -np 1 \ --no-mmap \ --reasoning off \ -t 8 \ --spec-draft-ngl 99 \ --spec-draft-type-k q8_0 \ --spec-draft-type-v q8_0 \ --spec-type draft-mtp \ --spec-draft-n-max 4 \ --metrics

$djdextune's tweet photo. Gemma4 26B QAT Q4 + MTP + llama.cpp @ RTX 3060 12GB You shouldn't expect much from MTP in MoE models. Still, since it's out, I gave it a spin. Based on the 3060 12GB, the VRAM was insufficient, so I did CPU offloading. Compared to previous tests, it seems to come out about 15-20% more. The files I received are as follows. The driving command is listed below. If VRAM is insufficient, increase the -ncmoe value, and if it is sufficient, decrease or remove it. Let's just do it for now. gemma-4-26B_q4_0-it.gguf https://t.co/lo9cINqpHk gemma-4-26b-A4B-it-assistant-Q4_0-q4emb.gguf https://t.co/LVl8XL5VA4 ~/llama.cpp/build/bin/llama-server \ -m ~/models/gemma4-26b/gemma-4-26B_q4_0-it.gguf \ -md ~/models/gemma4-26b/gemma-4-26b-A4B-it-assistant-Q4_0-q4emb.gguf \ --host 0.0.0.0 \ --port 8080 \ -c 131072 \ -ncmoe 13 \ -fa on \ -ctk q8_0 \ -ctv q8_0 \ -b 1024 \ -ub 256 \ -np 1 \ --no-mmap \ --reasoning off \ -t 8 \ --spec-draft-ngl 99 \ --spec-draft-type-k q8_0 \ --spec-draft-type-v q8_0 \ --spec-type draft-mtp \ --spec-draft-n-max 4 \ --metrics$

0

2

0

156

dextune

@djdextune

about 24 hours ago

I tested the Gemma4 31B QAT + MTP setup on an RTX 3060 12GB with CPU offloading. It turns out, running a dense model that goes over the VRAM limit is definitely a bad idea. I got around 5-6 TPS, which is pretty abysmal. 😂😂

0

75

dextune

@djdextune

1 day ago

@sudoingX When Gemma4 12B first came out, it had some issues with looping or freezing mid-sentence, but running it on the recently updated llama.cpp improved it a lot. You should give it a try too. https://t.co/B80qFDnPCI

dextune

@djdextune

1 day ago

Gemma4 12B QAT Q4 + MTP + llama.cpp @ RTX3060 12GB llama.cpp https://t.co/tdidkUZ8SI unsloth/gemma-4-12B-it-qat-GGUF https://t.co/YKHcvYWbBl Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF https://t.co/JsQ53h1Nzr ~/llama.cpp/build/bin/llama-server \ -m ~/models/gemma4-12b/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf \ -md ~/models/gemma4-12b/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf \ -ngl 99 \ --spec-draft-ngl 99 \ -c 131072 \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --spec-draft-type-k q8_0 \ --spec-draft-type-v q8_0 \ --batch-size 2048 \ --ubatch-size 512 \ --jinja \ --spec-type draft-mtp \ --spec-draft-n-max 4 \ --temp 0.2 --top-p 0.85 --min-p 0.05 \ --host 0.0.0.0 --port 8080

1

2

1

294

0

74

dextune

@djdextune

1 day ago

@___Harald___ 놀랍다. 갖고 싶어.

0

227

dextune

@djdextune

1 day ago

Options can be adjusted based on your personal circumstances. To run it with 8GB of RAM, CPU offloading will likely be required.

0

33

dextune

@djdextune

1 day ago

Gemma4 12B QAT Q4 + MTP + llama.cpp @ RTX3060 12GB llama.cpp https://t.co/tdidkUZ8SI unsloth/gemma-4-12B-it-qat-GGUF https://t.co/YKHcvYWbBl Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF https://t.co/JsQ53h1Nzr ~/llama.cpp/build/bin/llama-server \ -m ~/models/gemma4-12b/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf \ -md ~/models/gemma4-12b/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf \ -ngl 99 \ --spec-draft-ngl 99 \ -c 131072 \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --spec-draft-type-k q8_0 \ --spec-draft-type-v q8_0 \ --batch-size 2048 \ --ubatch-size 512 \ --jinja \ --spec-type draft-mtp \ --spec-draft-n-max 4 \ --temp 0.2 --top-p 0.85 --min-p 0.05 \ --host 0.0.0.0 --port 8080

1

2

1

294

dextune

@djdextune

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users