simrat hanspal @simsimsandy - Twitter Profile

Pinned Tweet

almost 2 years ago

My recent blog with @hasgeek - “Decoding Llama3” is out. It’s a deep dive into the Llama3 model code released in April this year. This is a fun blog with a code-first approach. https://t.co/ZdRHB8hW2S

0

6

3

2

363

simrat hanspal @simsimsandy

7 months ago

🫣 Softmax is unstable with very large and very small numbers. 🤓 Here is a simple illustration of how (x-max) makes softmax stable for use.

simsimsandy's tweet photo. 🫣 Softmax is unstable with very large and very small numbers.

🤓 Here is a simple illustration of how (x-max) makes softmax stable for use. https://t.co/hPqrkUx2jd

0

76

simrat hanspal @simsimsandy

7 months ago

What does it mean to have dropout in Attention computation? Dropouts are used to prevent overfitting. In case of attention, we drop some attention scores, which means that if the model learnt to attend to some token, it now has to focus on other related tokens. #LLM #Attention

0

31

simrat hanspal @simsimsandy

7 months ago

I mean token to token embedding :’D

0

1

0

48

Who to follow

ClearFeed

@clearfeedai

AI-Powered Support platform, built for teams that support customers & employees on Slack. Web: https://t.co/m4wLk5BUwD or community: https://t.co/rexShc0kQC

Kyle Platt

@Kyle__Platt

Serial Entrepreneur and Engineer. Building cool things 🧲 https://t.co/qglU2SPB25 (50k+ MRR) 🐉 https://t.co/tGKQDiS2ve (1k MRR) 🔍 https://t.co/opQXXuABH0

Prince otes

@otesmails_otes

A writer, teacher, creative designer, Air Traffic Controller and Crypto Enthusiast

simrat hanspal @simsimsandy

7 months ago

Simple illustration of what token to word embedding conversion looks like.

1

3

0

119

simrat hanspal @simsimsandy

7 months ago

So, you use len(tokenizer) Not sure why colab is not recognising len() :D

0

32

simrat hanspal @simsimsandy

7 months ago

The tokeniser lies about how many tokens it holds ;) What the tokeniser returns is the size of the base vocabulary that it learnt during training. Everything after that are special tokens. Special tokens are like metadata and help structure context.

simsimsandy's tweet photo. The tokeniser lies about how many tokens it holds ;)

What the tokeniser returns is the size of the base vocabulary that it learnt during training. Everything after that are special tokens.

Special tokens are like metadata and help structure context. https://t.co/blyNfmxV30

1

0

52

simrat hanspal @simsimsandy

7 months ago

@Sam0kayy Balcony Lizards :/

0

6

simrat hanspal @simsimsandy

7 months ago

Trivial but worth a reminder use np.matmul for dot product instead of np. dot. np. dot is meant to be a flexible function that will adjust according to the input shape, instead of raising an error. Example np. dot(np.array([[1, 2], [3, 4]]), 10)

0

23

simrat hanspal @simsimsandy

almost 2 years ago

@A_K_Nain Thank you so much for the summary👌 One question, why did they mask the prompt token in SFT?

0

109

simrat hanspal @simsimsandy

almost 2 years ago

Budding entrepreneur 🥹 I purchased more than I planned.

Zainab Bawa @zainabbawa

almost 2 years ago

And also assisting madame in coordinating logistics and order shipping the day after #fifthel @jackerhack @_waabi_saabi_

0

15

0

1K

0

7

1

0

744

simrat hanspal @simsimsandy

almost 2 years ago

@fifthel @jackerhack I hope my pending orders arrive soon 😜

1

3

0

74

simrat hanspal @simsimsandy

almost 2 years ago

@Anwxsha I waaaaaant !!

0

2

0

36

simrat hanspal @simsimsandy

almost 2 years ago

Looking forward to it.

anwesha @Anwxsha

almost 2 years ago

Tech x society enthusiasts, show up for The Fifth Elephant Annual Conference on 13th July! I'll be hosting the session on Deploying AI in Key Sectors: Robust Risk Mitigation Strategies with @jnkhyati, @bargava, @simsimsandy & @fooobar @fifthel @hasgeek @anthillin @zainabbawa

Anwxsha's tweet photo. Tech x society enthusiasts, show up for The Fifth Elephant Annual Conference on 13th July!

I'll be hosting the session on Deploying AI in Key Sectors: Robust Risk Mitigation Strategies with @jnkhyati, @bargava, @simsimsandy & @fooobar

@fifthel @hasgeek @anthillin @zainabbawa https://t.co/wfHuurOSNx

1

10

5

0

710

0

2

0

156

simsimsandy retweeted

Bengaluru Systems (fka Bengaluru Systems Meetup) @BengaluruSys

almost 2 years ago

First, @simsimsandy walked us through GPU architecture, optimizations, CUDA, and the challenges of running large ML models on GPUs, with a special look at the attention mechanism, KV-Cache optimizations, and PagedAttention!

BengaluruSys's tweet photo. First, @simsimsandy walked us through GPU architecture, optimizations, CUDA, and the challenges of running large ML models on GPUs, with a special look at the attention mechanism, KV-Cache optimizations, and PagedAttention! https://t.co/FALLciigeQ

1

12

1

0

887

simrat hanspal @simsimsandy

almost 2 years ago

Thank you for the shoutout @TheOtherRaghav. It was a lovely event. https://t.co/uHrtrpCSAJ

0

3

0

286

simrat hanspal @simsimsandy

almost 2 years ago

If you are into GenAI, @hasgeek is organizing a call today to build a community on #ResponsibleAI. Join for cross-learning. 🔗 Meeting Link: Register here to confirm your participation - https://t.co/R8WuiH76mS 🕰Time: 7 PM IST Friday, 28 June (tonight)

0

3

1

227

simsimsandy retweeted

Tune AI @Tunehq_ai

almost 2 years ago

🚀Join us in Chennai next week for our hands-on workshop: "Building AI Agents with RAG and Functions" 🤖✨ Limited seats available, so hurry and secure your spot! 🏃‍♂️💨 🔗 Register now: https://t.co/DYaWnmkREp #AIWorkshop #ChennaiEvents #llm #genai

1

7

5

3

2K

simrat hanspal @simsimsandy

almost 2 years ago

Thank you for the call out @zainabbawa :) Best wishes to all the speakers at FifthEl 2024, looking forward to networking in person.

Zainab Bawa @zainabbawa

almost 2 years ago

@anscombes4tet @Aditi_ahj @fifthel .@simsimsandy introduced Bhumika Makwana @GalaxEyeSpace who will speak about multimodal fusion as the new game changer. Reach out to Simrat for review and feedback on #nlp work, and for simplifying complex AI concepts. 3/5

1

5

1

0

512

0

2

1

0

240

simrat hanspal @simsimsandy

about 2 years ago

Really fun video on the basics - dot prd and inner prd. Also, potentially a great resource on Quantum Mechanics #QuantumSense YT channel. https://t.co/McXTOSDOed Inner product is an important concept for Rotary Positional Embedding, which is used by #LLM like #Llama3 (#Llama).

0

2

0

128

simrat hanspal

@simsimsandy

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users