Nico @nlienard - Twitter Profile

Pinned Tweet

Nico @nlienard

over 10 years ago

Ca va finir avec une couronne d'alu sur la tete ;-) https://t.co/GwEMF0feKP

zooko🛡🦓🦓🦓 ⓩ

@zooko

over 10 years ago

How to steal a GPG key from the next room: https://t.co/DI68b3CWPr By Eran Tromer (https://t.co/75ue55geIm) et al.

9

496

542

1

0

13

3

0

Nico @nlienard

1 day ago

@ostrisai @ideogram_ai is it possible to do training on a 3090 ?

1

0

138

Nico @nlienard

2 months ago

@wildmindai does it work with 4B and 9B ?

0

434

Nico @nlienard

3 months ago

@BeastySilentd @_SaxX_ ok j'ai trouvé: https://t.co/2WGPl1Gp5Y mais y a pas encore la version 27B

0

1

0

67

Who to follow

Mike Lester

@DrewLester07

Nico @nlienard

3 months ago

@KuittinenPetri @bnjmn_marie following @sudoingX tips, i'm using 27B with Opus Reasoning: https://t.co/QBI9ylk5xB (Q4 K_M for my 3090)

0

1

120

Nico @nlienard

3 months ago

@bnjmn_marie try with Opus reasonning, it helps to not loop in thinking : https://t.co/QBI9ylk5xB

1

3

0

4

463

Nico @nlienard

3 months ago

@BeastySilentd @_SaxX_ tu l'as trouvé ou ? je suis sur https://t.co/T2038HA1ZA en Q4 K_M et j'aimerais l'heretic

1

0

86

Nico @nlienard

3 months ago

@_SaxX_ yes, t'as testé celle la ? https://t.co/lWb8RS24uL je cherche la version Heretic en Q4 K_M (3090).

0

95

Nico @nlienard

3 months ago

@stevibe @sudoingX type "ollama list" to see the GGUF you are using

0

1

0

126

Nico @nlienard

3 months ago

@sudoingX 4token/s with 3090 with these settings any idea why ? set LLAMA_DIR=D:\IA\llama.cpp set MODEL=D:\IA\models-llama\Qwen3.5-27B-Q4_K_M.gguf set PORT=8000 "%LLAMA_DIR%\llama-server.exe" -m "%MODEL%" -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0

0

74

Nico @nlienard

3 months ago

@sudoingX why not adding "-np 1" in that case ? no diff ?

0

34

Nico @nlienard

3 months ago

@bnjmn_marie @wuxi_China Look at @sudoingX post, he is doing a lot of test on qwen3.5 and according to him , it is the good spot

Sudo su

@sudoingX

3 months ago

first impressions of qwen 3.5 27B dense on a single RTX 3090. 35 tok/s. from 4K all the way to 300K+ context. no speed drop. hermes 4.3 started at 35 and degraded to 15 as context filled. qwen dense holds. MoE held 112 flat. 3x faster but only 3B of 35B active per token. architecture tradeoff. Q4_K_M on 16.7GB. native context 262K. pushed past training limit to 376K before VRAM ceiling on 24GB. tried q8 KV cache at 262K, speed collapsed to 11 tok/s. q4_0 KV is the sweet spot. flash attention mandatory. built in reasoning mode. the model thinks step by step before it answers. full chain of thought surviving Q4 quant. 1,799+ token thinking chains with self correction loops. on a single consumer GPU. gave it one prompt: "build a realtime particle galaxy simulation in one HTML file." 3,340 tokens. 95 seconds. one shot. ran on first load. full reasoning and coding in the video below. optimal config if you want to skip the hours of testing: llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 this is just the warmup. octopus invaders is next: 10 files, 3,400+ lines, zero steering. the prompt hermes quit at 22%. already more impressed than expected. full results coming soon.

38

748

56

539

120K

0

84

Nico @nlienard

3 months ago

@iMilnb Llama.cpp est bien vénère aussi niveau optimisation. Tu devrais suivre le compte @sudoingX il fait plein de tests sur tous les qwen3.5

1

2

0

86

Nico @nlienard

3 months ago

@bluetouff C est top nan ?

1

0

322

Nico @nlienard

3 months ago

@KaelirRises @OpenAI Which GPU ? Which quant ? How many context length ? With llama.cpp ? 🙏

1

0

315

Nico @nlienard

3 months ago

@wuxi_China @bnjmn_marie What about q4_0 KV using llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 for 3090 ?

1

0

55

Nico @nlienard

3 months ago

@sudoingX At the end , we can run both model (27B dense or 35B A3B MoE) with 262k context length on a single 3090 . MoE is 90t/s and dense is 35t/s but which one have better quality ? ☺️

3

2

0

1

723

Nico @nlienard

3 months ago

@MaxiBernard_ C est quel model ? Flux Klein 9B ?

0

17

Nico @nlienard

3 months ago

@waltonoemi @sudoingX just tried again and i'm at 80 now

0

1

0

28

Nico @nlienard

3 months ago

@waltonoemi @sudoingX what is cache version command ? i run it like that

1

0

37

Nico

@nlienard

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users