Paul Swenson

Verified account

@pdscomp

Systems Engineer, developer, self-hosted AI enthusiast, 3D printer hacker, musician, skier/rock climber, #doodledad! (he/him) @[email protected]

Greenbelt, MD

Joined February 2009

446 Following

187 Followers

3.9K Posts

29 days ago

I was able to bind a specific model in a specific telegram channel, it looks like this: telegram:-1001234567123:8: model: default: gemini-2.5-flash provider: custom:google-aistudio I use this channel for research, and it will always respond with gemini-2.5-flash by default. Ask /hermes-agent skill to set this up for you, from the channel in question! It can figure out the # by grepping gateway logs and add to config.yaml!

0

0

0

0

45

about 1 month ago

@mikascend @0xSero I'm having good luck with the MiniMax coding plan and M2.7!

0

0

0

0

53

about 1 month ago

@sudoingX I don't consider performance improvements to be the main benefit of TurboQuant (although in cases where you have constrained memory bandwidth, running kv cache on tq3/tq3 can help there), it's mostly larger context or better quality (i.e. asym q8_0 / tq4 vs q4_0 /q4_0)

1

2

0

0

225

about 1 month ago

@sudoingX Anon label your axis 😭

0

2

0

0

101

Who to follow

Verified account

Now: @zeno_power @balerionspace @vast Former: @NASA @reliable_robot @blueorigin @virgingalactic @generationorbit @SpaceWorksSEI @gtssdl. My tweets my own.

Senior Editor at Bloomberg @technology, RRCA-certified for run coaching, native Brooklynite, amateur cook, mom. Views my own.

Austr. Space Forum

Planetary analog research. Doing #simulateMars missions. Building spacesuits simulators | detecting space debris | inspire next-gen. #Adler2 #AMADEE24

about 2 months ago

@DragonGroky @0xSero I've spent a lot of time optimizing llama.cpp for 3.5-27b dense and now 3.6-35b on a single 3090! Configs here: https://t.co/0FyvrHTmKW

1

1

0

0

39

about 2 months ago

@sudoingX @spiritbuun @no_stp_on_snek I hadn't tried the 31b dense Gemma yet so just ran some initial tests. It runs but is not very happy w/ asym kv-cache or K_M/XL. To get decent performance, used Q4_K_S, turbo4/turbo4, and 224k context, stable 28t/s w/ Hermes! Full cfg pushed to my repo: https://t.co/0FyvrHTmKW

0

0

0

0

62

about 2 months ago

@sudoingX You should be able to fit 256k ctx 24G w/ Gemma via asymmetric q8_0/turbo4 or worst case turbo4/turbo4 with little quality loss using @spiritbuun or @no_stp_on_snek TurboQuant llama.cpp forks-check out my repo re how to pull and build them from source! 💜 https://t.co/0FyvrHTmKW

2

4

0

1

325

about 2 months ago

@i_loder @sudoingX I'm running similar cfg to @sudoingX, just on WSLv2 + Docker. CUDA just works. Try out my heavily optimized configs, now w/ Qwen3.6, full 256k context, Q4_K_M/XL, asymmetric q8/TurboQuant4 here, single card 24GB VRAM 106t/s on my rig. Hermes works amazing! https://t.co/0FyvrHTmKW

1

1

1

0

50

about 2 months ago

@sudoingX Going with dense until Carnice or 3.6-35b proves themselves

2

3

0

1

921

about 2 months ago

@sudoingX This is the one I'm most excited about 💕

0

0

0

0

43

about 2 months ago

@DragonGroky @MyopicRaccoon @LottoLabs @no_stp_on_snek I'm getting 35t/s on WSL2 (Docker Desktop + Debian Trixie WSL container), RTX3090 connected via Oculink dock, with this config: https://t.co/0FyvrHTmKW I point my Linux host running Hermes to WindowsHost:8080, works amazing! Can also run Hermes right in WSL2 and use localhost.

1

1

0

0

114

about 2 months ago

@AgentArchetype @tubatrades @sudoingX @Teknium @NousResearch I finally had a chance to properly document my config and automate setup: https://t.co/0FyvrHTmKW My TurboQuant (tx @no_stp_on_snek @spiritbuun!) config gives 35t/s, RTX3090 fits Qwen3.5-27B-UD-Q4_K_XL, MAX 256k context and effective q8 kv-cache (asymmetric q8_0/turbo4)--insane!

0

2

0

0

80

about 2 months ago

@AgentArchetype @tubatrades @sudoingX @Teknium @NousResearch I am running this exact setup, and it's very reliable! Recommend installing docker desktop on Windows, integrate it with your WSL V2 so you can run docker run hello-world successfully, and then run a llama.cpp server cuda13 docker image to host your local LLM! DM me for configs!

1

3

0

1

54

about 2 months ago

@sudoingX I finally had a chance to properly document my config and automate setup: https://t.co/0FyvrHTmKW My TurboQuant (tx @no_stp_on_snek @spiritbuun!) config gives 35t/s, RTX3090 fits Qwen3.5-27B-UD-Q4_K_XL, MAX 256k context and effective q8 kv-cache (asymmetric q8_0/turbo4)--insane!

0

13

0

15

982

about 2 months ago

@OnlyTerp @sudoingX Qwen 27b with bigger Unsloth quant, 256k context window and q8_0 k / turbo4 asymmetric k/v cache, check out @no_stp_on_snek work to bring turbo to llama.cpp!

0

3

0

0

84

about 2 months ago

@kriskarols @sudoingX Definitely! Running that config. T/s goes down a bit vs fitting in a single card though, but it can do more! 3090+3090 works well but also 5070Ti+3090. I have the two 3090s in Oculink docks so I can deploy them tactically 😁

0

1

0

1

261

about 2 months ago

@sudoingX Secret post for subscribers only with more deets on what's coming pls 🤤

0

0

0

0

30

2 months ago

@MyopicRaccoon @DragonGroky @LottoLabs @no_stp_on_snek I'm running Q4_K_XL with q8_0 k and turbo4 V (should be near zero loss) on a single 3090, ~33 t/s! Interesting idea jumping up to the Q5 though!

1

0

0

0

71

2 months ago

@ChujieZheng 24-35b dense please ❤️❤️❤️

0

1

0

0

1K

2 months ago

@0xSero Droid!!!

0

0

0

0

4

Last Seen Users on Sotwe

Trends for you

Most Popular Users