Tom Jobbins @TheBlokeAI - Twitter Profile

over 2 years ago

@0xSage @huggingface @janhq_ @greennode23 Is https://t.co/zvhWp4LNKv compatible with all GGUF models? If so I can link it in my READMEs if you like

2

15

0

3

2K

Tom Jobbins @TheBlokeAI

over 2 years ago

@narsilou Will look at Medusa shortly!

0

1

0

1K

TheBlokeAI retweeted

emozilla

@theemozilla

over 2 years ago

FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768

theemozilla's tweet photo. FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length)

config.sliding_window = 32768 https://t.co/K53Mwfc519

16

413

38

205

59K

Tom Jobbins @TheBlokeAI

over 2 years ago

Transformers now supports Mixtral GPTQs and I've updated my READMEs accordingly. It was awesome working with @_marcsun and @younesbelkada of @huggingface on this! Credit to LaaZa for coding the AutoGPTQ quant and inference implementation which enabled me to get GPTQs out fast!

Marc Sun @_marcsun

over 2 years ago

Announcing 4-bit Mixtral 8x7B on 🤗Transformers! Run the new Mistal MoE with minimal performance degradation on your local computer (24Go) 🔥 Stay tuned as more quants are coming soon using AWQ. We are also looking into sparsification with @Tim_Dettmers https://t.co/Pu4XfpYOmW

9

627

119

311

118K

13

130

20

25

42K

Who to follow

Eric Hartford

@QuixiAI

We make AI models Dolphin and Samantha BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4 https://t.co/3ri2GbXrQB https://t.co/zH0F3pTjjY @dphnAI

Jon Durbin

@jon_durbin

Human. Backend dev https://t.co/CJYvkACyne

Yam Peleg

@Yampeleg

The only AI researcher they sent a missile for 🇮🇱 | Co-host @thursdai_pod • AI news every Thursday

Tom Jobbins @TheBlokeAI

over 2 years ago

@diynikola @1littlecoder

2

7

0

2

513

TheBlokeAI retweeted

Aleksa Gordić (水平问题)

@gordic_aleksa

over 2 years ago

@TheBlokeAI joined me to share his work in the open-source AI space - don't miss it! happening right now server link: https://t.co/C21orV2hzx (see the general channel or events channel for google meet link)

gordic_aleksa's tweet photo. @TheBlokeAI joined me to share his work in the open-source AI space - don't miss it! happening right now

server link: https://t.co/C21orV2hzx

(see the general channel or events channel for google meet link) https://t.co/L6S2yrRUkl

1

23

1

2

16K

Tom Jobbins @TheBlokeAI

over 2 years ago

@MTrofficus You're much too kind - I've merely played a small part in pushing forward the wave. Remember that without the model creators, I'd have nothing to quantise! :) And without the model training code, they'd not be able to train. And so on We're all doing our bit in our own ways 🚀

1

25

2

0

813

TheBlokeAI retweeted

younes @yb2698

over 2 years ago

Blazing fast text generation using AWQ and fused modules! 🚀 Up to 3x speedup compared to native fp16 that you can use right now on any models supported by @TheBlokeAI Simply pass an `AwqConfig` with `do_fuse=True` to `from_pretrained` method! https://t.co/4bbDGPebsC

5

158

20

90

26K

Tom Jobbins @TheBlokeAI

over 2 years ago

It's been awesome to see Transformers getting support for more and more quantisation methods. And I've loved collaborating with @younesbelkada and @huggingface again! All my AWQ uploads now support Transformers. READMEs will update soon to show a Transformers Python example.

younes @yb2698

over 2 years ago

Few months ago, researchers from MIT-Han Lab released AWQ The method is now supported in 🤗 transformers library ! As simple as 1- `pip install autoawq` or install llm-awq kernels and 2- call `from_pretrained` A great work from MIT-Han lab folks, Casper Hansen & @TheBlokeAI 🧵

yb2698's tweet photo. Few months ago, researchers from MIT-Han Lab released AWQ

The method is now supported in 🤗 transformers library !

As simple as 1- `pip install autoawq` or install llm-awq kernels and 2- call `from_pretrained`

A great work from MIT-Han lab folks, Casper Hansen & @TheBlokeAI 🧵 https://t.co/iJoS612vtP

2

129

21

64

69K

3

153

25

40

57K

TheBlokeAI retweeted

Chirper

@chirperai

over 2 years ago

Have you heard about Chirper worlds? 👀🌐

3

27

8

13

17K

TheBlokeAI retweeted

Victor M

@victormustar

over 2 years ago

🤔 Are you interested in a "Follow" feature on the Hugging Face Hub? ➡️ This will allow you to see new models/records/spaces from users you follow.

victormustar's tweet photo. 🤔 Are you interested in a "Follow" feature on the Hugging Face Hub?
➡️ This will allow you to see new models/records/spaces from users you follow. https://t.co/Fw1uUMQtqD

15

101

9

4

39K

TheBlokeAI retweeted

Julien Chaumond

@julien_c

over 2 years ago

oh hello @TheBlokeAI I want to bookmark your 'Recent models' Collection on @huggingface 🔥 Well... you can now upvote Collections! and browse upvoted collections on your profile ❤️

2

47

9

0

13K

Tom Jobbins @TheBlokeAI

over 2 years ago

@natserran0 Glad you found the quantization useful. All credit for the quality of the model goes to its creators! And yes that model is still very popular after many months.

1

0

181

Tom Jobbins @TheBlokeAI

over 2 years ago

Thanks again to @latitudesh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and 🚀 network. A server that ✅ all is v. rare!

14

240

13

8

32K

Tom Jobbins @TheBlokeAI

over 2 years ago

@vanstriendaniel Aw shucks! BTW, are you involved with the Librarian Bot that sends PRs asking people to add base_model to YAML? If so, FYI last week I updated my code so I now link to the source model (the model I quantised) using base_model - hope you can use this data somehow!

TheBlokeAI's tweet photo. @vanstriendaniel Aw shucks! BTW, are you involved with the Librarian Bot that sends PRs asking people to add base_model to YAML? If so, FYI last week I updated my code so I now link to the source model (the model I quantised) using base_model - hope you can use this data somehow! https://t.co/AeS9vcvQ2s

2

8

0

1K

TheBlokeAI retweeted

Arena.ai

@arena

over 2 years ago

🔥Excited to introduce LMSYS-Chat-1M, a large-scale dataset of 1M real-world conversations with 25 cutting-edge LLMs! This dataset, collected from https://t.co/4LVJjx4pZi, offers insights into user interactions with LLMs and intriguing use cases. Link: https://t.co/koniYAR4MD

9

358

84

131

96K

TheBlokeAI retweeted

younes @yb2698

over 2 years ago

New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!

yb2698's tweet photo. New feature alert in the @huggingface ecosystem!

Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8)

First pip install flash attention and pass use_flash_attention_2=True when loading the model! https://t.co/Qxs22LInZF

8

506

99

208

115K

Tom Jobbins @TheBlokeAI

over 2 years ago

@SebastianB929 @teknium @latitudesh No, I've not tried LMDeploy properly yet. I tried it briefly once but I was getting terrible performance and I didn't have time to investigate it further. I know they claim a lot but I've not been able to verify it myself yet

0

2

0

118

Tom Jobbins @TheBlokeAI

over 2 years ago

It's the AWQpocalypse! I've cranked the handle and AWQs are flooding HF. Why now? New library AutoAWQ provides turbo-charged Transformers-based inference, and vLLM now supports AWQ for multi-user inference serving. Making 8 at once on a beautiful 8xH100 server from @latitudesh

TheBlokeAI's tweet photo. It's the AWQpocalypse!

I've cranked the handle and AWQs are flooding HF. Why now? New library AutoAWQ provides turbo-charged Transformers-based inference, and vLLM now supports AWQ for multi-user inference serving.

Making 8 at once on a beautiful 8xH100 server from @latitudesh https://t.co/Fd6eT41bRs

9

94

14

12

21K

Tom Jobbins @TheBlokeAI

over 2 years ago

@teknium @latitudesh It can. Currently it doesn't scale quite as well as unquantised, so best performance is still fp16. But it does enable using smaller hardware, which could work out cheaper overall, and often has much easier availability.

1

2

0

421

Tom Jobbins @TheBlokeAI

over 2 years ago

@teknium @latitudesh vLLM is a continuous batching server, yes. AWQ is not faster than standalone ExLlama for batch size 1 but in a continuous batching scenario yes it would be - ie vLLM with AWQ will outperform TGI using GPTQ + ExLlama kernel. But for max bsz=1 throughput, ExLlama still rules all.

1

4

1

2K

Tom Jobbins

@TheBlokeAI

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users