Nik Holiday

@nikholiday

Toronto, Canada

Joined November 2016

691 Following

116 Followers

213 Posts

nikholiday retweeted

about 2 months ago

A preview for Pro users: a new personal finance experience in ChatGPT. Pro users in the U.S. can securely connect financial accounts, see where their money is going, and ask questions based on the information they choose to connect. Your full financial picture, now in ChatGPT.

1K

25K

2K

12K

17M

Nik Holiday @nikholiday

11 months ago

@charles_irl So true

0

0

0

0

25

Nik Holiday @nikholiday

about 1 year ago

@rafalwilinski Cloudflare's San Francisco office, and the wall is called “Wall of Entropy”

0

10

0

0

452

Nik Holiday @nikholiday

about 1 year ago

@kentcdodds @aiDotEngineer Yeah! That kind of turnout could’ve been expected; maybe a ballroom for the most hyped tech next time? On a related note, we built Zapier MCP servers. Would love for you to check it out https://t.co/YQLLpyVPup

0

1

0

0

40

Who to follow

Verified account

Building at @Contentful | Berlin-based dev | Helping engineers break into Europe’s tech scene - https://t.co/V6xXlrCVc4

Hafidh Soekma Ardiansyah

Making Accessible Indonesia AI Model with @azale_ai 🥀 | Tech and Science Enthusiast 🧬

Software Engineer (Backend | Mobile)

Nik Holiday @nikholiday

about 1 year ago

@rafalwilinski Well deserved💪

0

1

0

0

38

Nik Holiday @nikholiday

over 1 year ago

nikholiday's tweet photo. @vitorbal https://t.co/zvAf0yfzrW

0

1

0

0

24

Nik Holiday @nikholiday

over 1 year ago

@rafalwilinski Touché!🥋

0

1

0

0

14

Nik Holiday @nikholiday

over 1 year ago

@ckor SeekDip📈

0

1

0

0

34

Nik Holiday @nikholiday

over 1 year ago

@humanloop Nice

0

0

0

0

37

Nik Holiday @nikholiday

over 1 year ago

0

0

0

0

25

Nik Holiday @nikholiday

over 1 year ago

where there’s a will, there’s a way

1

0

0

0

60

nikholiday retweeted

almost 2 years ago

My analysis for Llama 3.1 1. 15.6T tokens, Tools & Multilingual 2. Llama arch + new RoPE 3. fp16 & static fp8 quant for 405b 4. Dedicated pad token 5. <|python_tag|><|eom_id|> for tools? 6. Roberta to classify good quality data 7. 6 staged 800B tokens long context expansion Long analysis: 1. New RoPE extension method Uses an interesting low and high scaling factor, and scales the inv_freq vector - can be computed in 1 go, so no need for dynamic re computation. Used a 6 stage ramping up approach from 8K tokens to 128K tokens with 800B tokens. 2. Training 38% to 43% MFU using bfloat16. Pipeline parallelism used + FSDP. Model averaging for RM, SFT & DPO stages. 3. Data mixture 50% general knowledge 25% maths & reasoning 17% code data and tasks 8% multilingual data 4. Preprocessing steps Uses Roberta, DistilRoberta, fasttext to filter out good quality data. Lots of de-duplication and heuristics to remove bad data. 5. Float8 quantization Quantizes weights to fp8 and input to fp8, then multiplies by scaling factors. fp8 x fp8 then output is bf16. Faster for inference & less VRAM use. 6. Vision & Speech Experiments The Llama 3.1 team also trained vision & speech adapters - not released though, but very cool! Working on adding support into @UnslothAI! Uploaded 4bit bitsandbytes quants for 8b, 70b and 405b ongoing to https://t.co/gHMS1CeFLF

danielhanchen's tweet photo. My analysis for Llama 3.1

1. 15.6T tokens, Tools & Multilingual
2. Llama arch + new RoPE
3. fp16 & static fp8 quant for 405b
4. Dedicated pad token
5. <|python_tag|><|eom_id|> for tools?
6. Roberta to classify good quality data
7. 6 staged 800B tokens long context expansion

Long analysis:

1. New RoPE extension method
Uses an interesting low and high scaling factor, and scales the inv_freq vector - can be computed in 1 go, so no need for dynamic re computation. Used a 6 stage ramping up approach from 8K tokens to 128K tokens with 800B tokens.

2. Training
38% to 43% MFU using bfloat16. Pipeline parallelism used + FSDP. Model averaging for RM, SFT & DPO stages.

3. Data mixture
50% general knowledge
25% maths & reasoning
17% code data and tasks
8% multilingual data

4. Preprocessing steps
Uses Roberta, DistilRoberta, fasttext to filter out good quality data. Lots of de-duplication and heuristics to remove bad data.

5. Float8 quantization
Quantizes weights to fp8 and input to fp8, then multiplies by scaling factors. fp8 x fp8 then output is bf16. Faster for inference & less VRAM use.

6. Vision & Speech Experiments
The Llama 3.1 team also trained vision & speech adapters - not released though, but very cool!

Working on adding support into @UnslothAI!
Uploaded 4bit bitsandbytes quants for 8b, 70b and 405b ongoing to https://t.co/gHMS1CeFLF

24

992

226

534

75K

Nik Holiday @nikholiday

over 1 year ago

@elonmusk None shall share my burden

nikholiday's tweet photo. @elonmusk None shall share my burden https://t.co/vDlJsPM4yD

0

0

0

0

13

Nik Holiday @nikholiday

over 1 year ago

@sirajraval That might be a good case for the incentive - to be paid in BTC

1

0

0

0

60

Nik Holiday @nikholiday

over 1 year ago

nikholiday's tweet photo. @beffjezos https://t.co/jGsb4CJcFF

0

0

0

0

9

Nik Holiday @nikholiday

over 1 year ago

@sirajraval The transaction costs are high though

1

0

0

0

55

Nik Holiday @nikholiday

over 1 year ago

0

0

0

0

18

Nik Holiday @nikholiday

over 1 year ago

@sirajraval BTC value preservation might be interesting in some rare cases. A decentralized compute network doing useful AI inference work makes more sense

1

0

0

0

46

Nik Holiday @nikholiday

over 1 year ago

nikholiday's tweet photo. @beffjezos https://t.co/4XAbFtlaHZ

0

0

0

0

4

Nik Holiday @nikholiday

over 1 year ago

@beffjezos "I was able to triangulate the cell phone signal and trace the caller. His name is Adolf Hitler." - Hackerman (Leopold Nilsson)

0

0

0

0

18

Last Seen Users on Sotwe

Trends for you

Most Popular Users