CKtalon @CKtalon - Twitter Profile

CKtalon @CKtalon

over 1 year ago

@abacaj That’s if they even release a GPU

0

1

0

67

CKtalon @CKtalon

about 2 years ago

@RealJosephus @suchenzang Seems like the corpus used to train the tokenizer isn’t as clean as the corpus used to train the LLM

0

375

CKtalon @CKtalon

about 2 years ago

@drummatick @laurensweitkamp @suchenzang With 200k vocabulary, it’s entirely possible to have many full words

0

1

0

65

CKtalon @CKtalon

about 2 years ago

@SashaMTL Just stating facts. BLOOM having 1131 citations despite being released in 2022 while Llama2 having 3855 despite being released 8 months later. BLOOM was just severely undertrained with the amount of limited compute they had, with way too much ambition to do so many languages.

0

2

0

122

Who to follow

Yinfang

@yinfang_chen

Ph.D. Candidate at Cornfield

JYG SELLS

@JygSells

Selling & Trading Acct of @dooomdada | For Pasabuy & Onhand | Feedbacks in IG or #jygfeedback | Sold=🗑 | https://t.co/qLHDzOEyN1

Jonathan Castello

@Twisol

Ph.D student under @lindsey. I like concurrency and sustainable codebases. https://t.co/5MtoW8xi5l No, the domain is not for sale, sorry.

CKtalon @CKtalon

over 2 years ago

@dctanner @mov_axbx That’s a really expensive server on eBay considering its age and specs. It seems any used rack that can hold greater than 4 GPUs are highly inflated in price now.

1

0

36

CKtalon @CKtalon

over 2 years ago

@dctanner @mov_axbx What server rack model is that?

1

0

48

CKtalon @CKtalon

over 2 years ago

@realmrfakename @arpagon Not possible

0

2

0

88

CKtalon @CKtalon

over 2 years ago

@NVIDIAGeForce #RTXSUPER

0

19

CKtalon @CKtalon

over 2 years ago

@Yampeleg @abacaj Did similar trainings, and from some manual evaluations, the loss might have plateaued for hundreds of thousands of steps, but the quality of the generations are better given more epochs.

0

492

CKtalon @CKtalon

over 2 years ago

@charlieholtz @elevenlabs In the not-so-distant future, pairing this with the Meta Ray-Bans and have it narrate whatever you see will be mind-blowing.

0

26

CKtalon @CKtalon

over 2 years ago

@Suhail Helps when a ton of data is distilled from a powerful LLM? Phi-1.5 kinds of shows that generative data can produce a powerful model.

0

43

CKtalon @CKtalon

over 2 years ago

@BramVanroy OpenNMT does have most of those implemented since they are also now supporting LLMs. Marian looks dead, perhaps due to lowered importance by MSFT in preference of LLMs.

0

1

0

40

CKtalon @CKtalon

almost 3 years ago

@Yampeleg Just the preview shows how dirty the dataset is…

0

411

CKtalon @CKtalon

almost 3 years ago

@Science_boy_H @huggingface I’m suspecting QLoRA or LoRA doesn’t help for adding/increasing a model’s second language capabilities

0

61

CKtalon @CKtalon

almost 3 years ago

@DanielSMatthews @nearcyan Text isn’t one to one. More of a translated summary

1

0

36

CKtalon @CKtalon

about 3 years ago

@_BruceX_ @ID_AA_Carmack Time is money

0

92

CKtalon @CKtalon

about 3 years ago

@e270889o @ID_AA_Carmack Plenty of ram, but slow compute-wise. Apple’s CoreML is too opaque to developers, so the Neural Engine hasn’t been usable in an obvious way yet.

0

1

48

CKtalon @CKtalon

about 3 years ago

@decryption @coxymla Follow this: https://t.co/ci3hBBpUry

0

32

CKtalon @CKtalon

about 3 years ago

@tmophoto @abacaj You split the layers across different cards. That’s why you need fast interconnects like NVLink so that the GPUs can process the computations quickly without bottlenecks.

0

36

CKtalon

@CKtalon

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users