Tipu Sultan 🇵🇰 @engrtipusultan - Twitter Profile

Pinned Tweet

2 months ago

A bee does not waste its energy trying to convince a fly that honey is better than shit, it simply goes on about its business. Not every mind is open to growth, ego builds walls so thick that wisdom simply walks away. Conserve your energy for something that matters.

0

28

Tipu Sultan 🇵🇰 @engrtipusultan

11 days ago

@ZhihuFrontier Why MTP is stripped from model? Do you plan to add it back.

0

23

Tipu Sultan 🇵🇰 @engrtipusultan

about 1 month ago

@hkdennis2k @populartourist No it does not. It is hardcoded in llama.cpp

0

1

11

Tipu Sultan 🇵🇰 @engrtipusultan

about 1 month ago

@Howaboua Maybe you can also provide settings to point to Pi directory

0

3

Who to follow

Alexandru Paduraru

@Axelut

- Co-Founder @Creativetim https://t.co/q6W9Zepyzz - Co-Founder https://t.co/n7nKY1l7dy - Forbes 30 under 30

I love this game bkb!..Primer certificado en Child passenger safety technician. CPST, apasionado, sonador, ganador = Mauro A. Mi proyecto @bebesegurosv IG-FB

Tipu Sultan 🇵🇰 @engrtipusultan

about 1 month ago

@Howaboua I will try this one.

1

0

9

Tipu Sultan 🇵🇰 @engrtipusultan

about 2 months ago

@mick__net @leftcurvedev_ I have similar results on vulkan backend AMD APU. About 30% increase on n max 2. That is best increase. Increasing n max or adding ngram decreases speed gains.

1

2

0

63

Tipu Sultan 🇵🇰 @engrtipusultan

about 2 months ago

@witcheer Maybe its implementation has improved. When I checked on llama.cpp vulkan backend. It had severe drop in pp with increase in context.

0

1

0

64

Tipu Sultan 🇵🇰 @engrtipusultan

about 2 months ago

@FunkyClam @danielhanchen PR is not merged in mainline llama.cpp. So only way to use it right now is to merge the PR manually and build llama.cpp yourself. You can wait for it if you use pre-built llama.cpp release or Lmstudio. PR for Dflash is also in draft in llama.cpp repo.

1

0

67

Tipu Sultan 🇵🇰 @engrtipusultan

about 2 months ago

@nicodeory @letstri @zeddotdev Have you tried ACP adapter in zed. You can connect to Pi or other agents. Pi in zed works fine for me.

0

22

Tipu Sultan 🇵🇰 @engrtipusultan

2 months ago

I 100% agree with him. I need to install pi now 😅

Mario Zechner

@badlogicgames

3 months ago

I'm usually not one to write thought pieces without much technical depth. But here we go. Slow the fuck down. https://t.co/dcOwPK357F

154

3K

499

2K

626K

0

16

Tipu Sultan 🇵🇰 @engrtipusultan

2 months ago

@Wronglebowsk @ggerganov It cannot be supported by llama bench. This type of speculative decoding is dependent on your previous prompt and cache of previous conversation. Llama bench has dummy data. In real life ngram does not have advantage in creative writing or open ended reasoning also but coding.

0

1

0

73

Tipu Sultan 🇵🇰 @engrtipusultan

2 months ago

Damnn. I gave a problem to Google Gemini 3.1 pro and asked the same problem to Qwen3.6 Max preview. Qwen gave a better solution with more comprehensive response. On another note. Improvement in AI over last year is scary good. Traditional IT jobs are in absolute danger.

0

68

Tipu Sultan 🇵🇰 @engrtipusultan

3 months ago

@WuMinghao_nlp Are you going to continue with next architecture ? Good thing about that architecture was very little drop in tps either pp or tg with increase in context.

0

1

0

713

Tipu Sultan 🇵🇰 @engrtipusultan

3 months ago

@TeksEdge That is not EAGLE-3 based speculative decoding.

0

145

Tipu Sultan 🇵🇰 @engrtipusultan

3 months ago

@support_huihui Are there any benchmarks or is it an assumption?

0

111

Tipu Sultan 🇵🇰 @engrtipusultan

3 months ago

@TeksEdge @lmstudio Which backend you are using in Lmstudio? My understanding was that llama.cpp Eagle3 PR is still in draft.

0

1

0

388

Tipu Sultan 🇵🇰 @engrtipusultan

3 months ago

@basecampbernie Btw just for my understanding are you using vulkan or Cuda?

1

0

23

Tipu Sultan 🇵🇰 @engrtipusultan

3 months ago

@basecampbernie This interesting Gemma4 is faster than qwen3.5 A3B at q8 in your machine and reverse is true at q4. I think DGX spark is Blackwell architecture so you can also try bf16 variants rather than q8 xl which i believe is q8 plus fp16 for these models it might be faster or maybe nvfp4

1

0

196

Tipu Sultan 🇵🇰

@engrtipusultan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users