TK • 木下 @wordsofteekay - Twitter Profile

Pinned Tweet

TK • 木下 @wordsofteekay

about 1 year ago

One book at time. Better over time.

30

2K

84

846

101K

TK • 木下 @wordsofteekay

3 days ago

Another frame on Research Taste https://t.co/wocvvo9gyX

0

47

TK • 木下 @wordsofteekay

4 months ago

As an applied ML engineer who is learning more about research and theory, I found two interesting resources I read this week that are worth sharing. The first one is the "On Research Taste"¹ article by Albert Ying. I liked how he defines what 'taste' really is: "the ability to find the node that would affect the largest number of other nodes [...] over a network", where the graph is a collection of "hypotheses and analyses you could pursue". I think the missing part of this short article is "how to develop 'taste'". The second one is the "An Unofficial Guide to Prepare for a Research Position Application"² by Sakana AI. That was the most insightful blogpost I've read this year. It lays down all the core principles to be a great researcher, how to approach ideas, the importance of clear communication, and having a good balance between technical ability (engineering skills) and creativity. The post is more than how to prepare for their interview. It's their way of doing great research. ¹ https://t.co/rwKGjvhYY7 ² https://t.co/qUyg8IMCTF

5

375

34

531

15K

TK • 木下 @wordsofteekay

7 days ago

His playlist is also really good. But the resources I used are not really documentation. One is a course by Stanford, and the other is a book by Sebastian Raschka. The course is a great complement, because it goes beyond LLMs, it talks about resource management, GPUs, tensor optimization, parallel computation. Fun stuff.

1

0

22

Who to follow

Creative Developer & Front-end Engineer • DevRel & DX • Founder of @frontendbr, @frontinfloripa & @floripajs • @PearlJam fan

Supreme Leader Wiggum

@ScriptedAlchemy

Infra Architect @ ByteDance. Maintainer of @webpack @rspack_dev - creator of #ModuleFederation #auADHD #synesthesia own opinions.

TK • 木下 @wordsofteekay

9 days ago

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗚𝗣𝗧 𝗠𝗼𝗱𝗲𝗹 For the past few weeks, I've been reading about Foundation Models [0] and decided to work on the implementation of the GPT architecture [1] to understand its building blocks and how it works under the hood. Here are the concepts I worked on in this implementation: Tokenization → Embeddings → Self-Attention → Multi-Head Attention → Transformer Block → GPT Model → Pretraining. — The tokenization part was focused on building tokens from the input text and transforming them into token IDs; Then using a BPE tokenizer algorithm [2] — Embeddings: representing tokens with a simple scalar value (ID) is too simplistic. Embeddings come to build richer representations. I built small embeddings for learning purposes and then increased the representation to scale that — Multi-Head Self-Attention: this was one of the most interesting parts, creating attention scores and building relationships between tokens to produce context vectors — Transformer blocks have the attention heads, dropout, layer norm, and the feed-forward network — Pretraining is a standard training process used for deep learning models. But in this case, we update the weights end-to-end, from the embeddings to the attention layer to the feedforward network The implementation was highly inspired by the Language Modeling from Scratch course [3] and the Build a Large Language Model book [4]. It's still very rudimentary, but very useful if you plan to learn these concepts in depth. 🔗 Article Link: Self-Attention, Foundation Models, and the GPT Architecture from Scratch: https://t.co/R69Heqle3P --- In the future, I plan to write about finetuning (using foundation models and finetuning for other tasks) and optimizations (attention blocks optimization, GPU and kernel optimization). [0] Foundation Models at Nubank: https://t.co/xLRWLPOl3o [1] LLM implementation repo: https://t.co/STPyVTKlM4 [2] Tokenizers lecture: https://t.co/YtBodQ2SEW [3] Language Modeling from Scratch: https://t.co/TMGTAXJmqS [4] Build a Large Language Model: https://t.co/TawYd8Zi8M

wordsofteekay's tweet photo. 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗚𝗣𝗧 𝗠𝗼𝗱𝗲𝗹

For the past few weeks, I've been reading about Foundation Models [0] and decided to work on the implementation of the GPT architecture [1] to understand its building blocks and how it works under the hood.

Here are the concepts I worked on in this implementation:

Tokenization → Embeddings → Self-Attention → Multi-Head Attention → Transformer Block → GPT Model → Pretraining.

— The tokenization part was focused on building tokens from the input text and transforming them into token IDs; Then using a BPE tokenizer algorithm [2]
— Embeddings: representing tokens with a simple scalar value (ID) is too simplistic. Embeddings come to build richer representations. I built small embeddings for learning purposes and then increased the representation to scale that
— Multi-Head Self-Attention: this was one of the most interesting parts, creating attention scores and building relationships between tokens to produce context vectors
— Transformer blocks have the attention heads, dropout, layer norm, and the feed-forward network
— Pretraining is a standard training process used for deep learning models. But in this case, we update the weights end-to-end, from the embeddings to the attention layer to the feedforward network

The implementation was highly inspired by the Language Modeling from Scratch course [3] and the Build a Large Language Model book [4]. It's still very rudimentary, but very useful if you plan to learn these concepts in depth.

🔗 Article Link: Self-Attention, Foundation Models, and the GPT Architecture from Scratch: https://t.co/R69Heqle3P

---

In the future, I plan to write about finetuning (using foundation models and finetuning for other tasks) and optimizations (attention blocks optimization, GPU and kernel optimization).

[0] Foundation Models at Nubank: https://t.co/xLRWLPOl3o
[1] LLM implementation repo: https://t.co/STPyVTKlM4
[2] Tokenizers lecture: https://t.co/YtBodQ2SEW
[3] Language Modeling from Scratch: https://t.co/TMGTAXJmqS
[4] Build a Large Language Model: https://t.co/TawYd8Zi8M

4

81

4

62

2K

TK • 木下 @wordsofteekay

8 days ago

✨The Art of Doing Science and Engineering — full review: https://t.co/wQyCRznfhF

0

106

TK • 木下 @wordsofteekay

almost 3 years ago

📝 I hope with this new post, you can steal some ideas, and insights, and put them into practice in your life. This is my reflection about reading 47 books in the first 6 months of 2023 and how I am focusing on reading less + applying them in my life. https://t.co/8f2hUvHOzx

3

134

12

137

21K

TK • 木下 @wordsofteekay

8 days ago

✨ Build a Large Language Model — full review: https://t.co/KUBUgEG95y

1

3

0

1

163

TK • 木下 @wordsofteekay

9 days ago

@yash1_ https://t.co/aXPGOCPOM0

TK • 木下 @wordsofteekay

9 days ago

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗚𝗣𝗧 𝗠𝗼𝗱𝗲𝗹 For the past few weeks, I've been reading about Foundation Models [0] and decided to work on the implementation of the GPT architecture [1] to understand its building blocks and how it works under the hood. Here are the concepts I worked on in this implementation: Tokenization → Embeddings → Self-Attention → Multi-Head Attention → Transformer Block → GPT Model → Pretraining. — The tokenization part was focused on building tokens from the input text and transforming them into token IDs; Then using a BPE tokenizer algorithm [2] — Embeddings: representing tokens with a simple scalar value (ID) is too simplistic. Embeddings come to build richer representations. I built small embeddings for learning purposes and then increased the representation to scale that — Multi-Head Self-Attention: this was one of the most interesting parts, creating attention scores and building relationships between tokens to produce context vectors — Transformer blocks have the attention heads, dropout, layer norm, and the feed-forward network — Pretraining is a standard training process used for deep learning models. But in this case, we update the weights end-to-end, from the embeddings to the attention layer to the feedforward network The implementation was highly inspired by the Language Modeling from Scratch course [3] and the Build a Large Language Model book [4]. It's still very rudimentary, but very useful if you plan to learn these concepts in depth. 🔗 Article Link: Self-Attention, Foundation Models, and the GPT Architecture from Scratch: https://t.co/R69Heqle3P --- In the future, I plan to write about finetuning (using foundation models and finetuning for other tasks) and optimizations (attention blocks optimization, GPU and kernel optimization). [0] Foundation Models at Nubank: https://t.co/xLRWLPOl3o [1] LLM implementation repo: https://t.co/STPyVTKlM4 [2] Tokenizers lecture: https://t.co/YtBodQ2SEW [3] Language Modeling from Scratch: https://t.co/TMGTAXJmqS [4] Build a Large Language Model: https://t.co/TawYd8Zi8M

4

81

4

62

2K

0

1

0

10

TK • 木下 @wordsofteekay

10 days ago

✨ I worked on this article the whole day and made a lot of progress. I'm almost there. A lot of work, with many experiments, but it's getting traction. "Make Something Wonderful" inspired me to keep building and sharing.

TK • 木下 @wordsofteekay

24 days ago

[ML Grind] Finished: > Foundation Models: finished transformer-based model implementation from scratch + finetuning > Finished reading the Attention-based model in the industry paper: interesting insights about context length, scaling laws, and joint fusion Have been working on: > ML monitoring + alerting system for ML models > AI agent for business flow: interesting engineering learnings (agent/prompt refinements <> MCP <> backend + infra) > Real estate liquidity model: interesting learnings about temporal splits, model calibration, model optimization, and dataset exploration Plan for today: > Continue writing the blog post about the foundation model implementation > Continue the "Language Modeling from Scratch" course by Stanford > Read a new ML paper

wordsofteekay's tweet photo. [ML Grind]

Finished:
> Foundation Models: finished transformer-based model implementation from scratch + finetuning
> Finished reading the Attention-based model in the industry paper: interesting insights about context length, scaling laws, and joint fusion

Have been working on:
> ML monitoring + alerting system for ML models
> AI agent for business flow: interesting engineering learnings (agent/prompt refinements <> MCP <> backend + infra)
> Real estate liquidity model: interesting learnings about temporal splits, model calibration, model optimization, and dataset exploration

Plan for today:
> Continue writing the blog post about the foundation model implementation
> Continue the "Language Modeling from Scratch" course by Stanford
> Read a new ML paper

4

216

9

127

10K

2

16

2

11

1K

TK • 木下 @wordsofteekay

9 days ago

✨ Preview

0

3

0

4

535

TK • 木下 @wordsofteekay

9 days ago

@yash1_ I finished the writing, but still working on the illustrations. And then, I will carve out some time to refine it before publishing. Hopefully, tomorrow! (or this week).

1

0

20

TK • 木下 @wordsofteekay

11 days ago

I finally finished this book today. What a remarkable last chapter! I'm getting all my notes to share it online. Also, I'm looking for the next book! I accept recommendations.

1

0

112

TK • 木下 @wordsofteekay

2 months ago

📚 Started a new book today. I'm on the first few pages, and the way it was written already caught my attention. "Teachers should prepare the student for the student's future, not for the teacher's past. Most teachers rarely discuss the important topic of the future of their field, and when this is pointed out, they usually reply: 'No one can know the future'. It seems to me the difficulty of knowing the future does not absolve the teacher from seriously trying to help the student to be ready for it when it comes." Excited to be educated on styles of learning and thinking, and then get back to training, applying those principles.

wordsofteekay's tweet photo. 📚 Started a new book today.

I'm on the first few pages, and the way it was written already caught my attention.

"Teachers should prepare the student for the student's future, not for the teacher's past. Most teachers rarely discuss the important topic of the future of their field, and when this is pointed out, they usually reply: 'No one can know the future'. It seems to me the difficulty of knowing the future does not absolve the teacher from seriously trying to help the student to be ready for it when it comes."

Excited to be educated on styles of learning and thinking, and then get back to training, applying those principles.

8

274

15

168

27K

TK • 木下 @wordsofteekay

11 days ago

This last GPU lecture (FLOPs/memory movement optimization) was awesome! ✨

0

2

0

85

TK • 木下 @wordsofteekay

20 days ago

Many people have already pointed out, but this course by Stanford is remarkable. It's been part of the first hour of my morning. Watching the lecture, taking notes, spawns new tabs with different papers mentioned, and coding to build the intuition behind each lecture. Mixture of experts was a nice lecture, but the one I liked the most so far was about PyTorch and resource accounting and how to make sense of CPU/GPU, memory, runtime/compute (FLOPs), etc., from first principles. 🔗 link: https://t.co/TMGTAXIOBk

wordsofteekay's tweet photo. Many people have already pointed out, but this course by Stanford is remarkable. It's been part of the first hour of my morning. Watching the lecture, taking notes, spawns new tabs with different papers mentioned, and coding to build the intuition behind each lecture.

Mixture of experts was a nice lecture, but the one I liked the most so far was about PyTorch and resource accounting and how to make sense of CPU/GPU, memory, runtime/compute (FLOPs), etc., from first principles.

🔗 link: https://t.co/TMGTAXIOBk

3

153

10

174

5K

TK • 木下 @wordsofteekay

20 days ago

My notes on the repo: https://t.co/ZzZ028d7jL Even though most of my notes are written in my physical notebook. Still lacking time to move all to the repo. notes: https://t.co/2NGwUFU576

2

6

0

4

593

TK • 木下 @wordsofteekay

17 days ago

I've just read the "Let Me Convince You to Be Prolific" post about the benefits of being prolific, especially for creative people in the digital age. The idea is that we should create and release more experiments, creating this long tail of acceptable work: — Experiment > Failure > Refine > Loop — Publishing work helps people find you — Early drafts, faster feedback loop > faster improvement — Each experiment contributes to the following one I noticed this about my blog, where I've been writing for +10 years now. All the technical blogs I wrote helped improve the next one. Any of them is perfect, but I can see how much progress I have made over time. The things you learn, the feedback you get, and the will to refine your work lead to mastery. And the long tail of work starts to compound and help discover you. There are these two quotes I liked: > "Giving up on perfectionism doesn’t mean that you will not produce anything perfect, but rather that perfection will happen from time to time because of the sheer mass of output." — Dean Keith Simonton > "If you can write one short story a week — it doesn’t matter what the quality is to start, but at least you’re practicing, and at the end of the year you have 52 short stories, and I defy you to write 52 bad ones." — Ray Bradbury I found this blog in @noghartt's bookmarks. There's an awesome curation there. → Blog: https://t.co/LBI4yNF1d8

1

181

13

192

6K

TK • 木下 @wordsofteekay

17 days ago

@Swarnav13 https://t.co/XLCMCgsl64 Also, a repo filled with resources: https://t.co/ZzZ028d7jL

0

19

TK • 木下 @wordsofteekay

18 days ago

I've just found out about this course on Foundation Models and Generative AI. Quite interesting lectures. I plan to watch the lectures as soon as I finish the Language Modeling from Scratch course. So many interesting things to learn.

wordsofteekay's tweet photo. I've just found out about this course on Foundation Models and Generative AI. Quite interesting lectures. I plan to watch the lectures as soon as I finish the Language Modeling from Scratch course. So many interesting things to learn. https://t.co/ZeuuoQMR2a

3

160

10

175

6K

TK • 木下 @wordsofteekay

18 days ago

@CausalFlops28 I will find out as soon as I finish the Language Modeling from Scratch one.

1

0

97

TK • 木下 @wordsofteekay

20 days ago

@engqsucessor Weekdays usually look like this: 6h-9h: study/research 9h-18h: work 30min run + shower + dinner + family

1

0

110

TK • 木下 @wordsofteekay

20 days ago

@CausalFlops28 Manually, unfortunately. This is why most of my notebook notes are not transferred into the repo. But it's still a great tool to augment my thinking.

0

32

TK • 木下

@wordsofteekay

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users