𝕋𝕜

@tkptbr

Joined April 2020

1 Following

74 Followers

43 Posts

Pinned Tweet

𝕋𝕜 @tkptbr

8 months ago

この世界は騒がしすぎるから。聞かないで。見ないで。構わないで。そして前に進め。

tkptbr retweeted

TK • 木下 @wordsofteekay

2 days ago

This resonates a lot with my experience. My record was 60 books a year (not 80 in 6 months tho). Because I'm curious about a lot of things, many topics get my attention, so the "Parallelize" (books) tip is a really effective way to read more book. I read 3-4 at the same time, a bit every day, consistently. It turns out it is much easier to do, and in the long-term, I accomplish more. Reading a lot also made me rethink about which books I choose to read (reading less → reading better books: https://t.co/15bp8ZjRIm). And because I usually read technical and non-fiction books, it's great to re-read them, take notes, and think in way to apply the ideas in my life (https://t.co/4r4rNBruhE). "How To Read More" by Borretti: https://t.co/DW22tUxm7j

wordsofteekay's tweet photo. This resonates a lot with my experience. My record was 60 books a year (not 80 in 6 months tho). Because I'm curious about a lot of things, many topics get my attention, so the "Parallelize" (books) tip is a really effective way to read more book. I read 3-4 at the same time, a bit every day, consistently. It turns out it is much easier to do, and in the long-term, I accomplish more.

Reading a lot also made me rethink about which books I choose to read (reading less → reading better books: https://t.co/15bp8ZjRIm). And because I usually read technical and non-fiction books, it's great to re-read them, take notes, and think in way to apply the ideas in my life (https://t.co/4r4rNBruhE).

"How To Read More" by Borretti: https://t.co/DW22tUxm7j

841

735

21K

tkptbr retweeted

TK • 木下 @wordsofteekay

9 days ago

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗚𝗣𝗧 𝗠𝗼𝗱𝗲𝗹 For the past few weeks, I've been reading about Foundation Models [0] and decided to work on the implementation of the GPT architecture [1] to understand its building blocks and how it works under the hood. Here are the concepts I worked on in this implementation: Tokenization → Embeddings → Self-Attention → Multi-Head Attention → Transformer Block → GPT Model → Pretraining. — The tokenization part was focused on building tokens from the input text and transforming them into token IDs; Then using a BPE tokenizer algorithm [2] — Embeddings: representing tokens with a simple scalar value (ID) is too simplistic. Embeddings come to build richer representations. I built small embeddings for learning purposes and then increased the representation to scale that — Multi-Head Self-Attention: this was one of the most interesting parts, creating attention scores and building relationships between tokens to produce context vectors — Transformer blocks have the attention heads, dropout, layer norm, and the feed-forward network — Pretraining is a standard training process used for deep learning models. But in this case, we update the weights end-to-end, from the embeddings to the attention layer to the feedforward network The implementation was highly inspired by the Language Modeling from Scratch course [3] and the Build a Large Language Model book [4]. It's still very rudimentary, but very useful if you plan to learn these concepts in depth. 🔗 Article Link: Self-Attention, Foundation Models, and the GPT Architecture from Scratch: https://t.co/R69Heqle3P --- In the future, I plan to write about finetuning (using foundation models and finetuning for other tasks) and optimizations (attention blocks optimization, GPU and kernel optimization). [0] Foundation Models at Nubank: https://t.co/xLRWLPOl3o [1] LLM implementation repo: https://t.co/STPyVTKlM4 [2] Tokenizers lecture: https://t.co/YtBodQ2SEW [3] Language Modeling from Scratch: https://t.co/TMGTAXJmqS [4] Build a Large Language Model: https://t.co/TawYd8Zi8M

wordsofteekay's tweet photo. 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗚𝗣𝗧 𝗠𝗼𝗱𝗲𝗹

For the past few weeks, I've been reading about Foundation Models [0] and decided to work on the implementation of the GPT architecture [1] to understand its building blocks and how it works under the hood.

Here are the concepts I worked on in this implementation:

Tokenization → Embeddings → Self-Attention → Multi-Head Attention → Transformer Block → GPT Model → Pretraining.

— The tokenization part was focused on building tokens from the input text and transforming them into token IDs; Then using a BPE tokenizer algorithm [2]
— Embeddings: representing tokens with a simple scalar value (ID) is too simplistic. Embeddings come to build richer representations. I built small embeddings for learning purposes and then increased the representation to scale that
— Multi-Head Self-Attention: this was one of the most interesting parts, creating attention scores and building relationships between tokens to produce context vectors
— Transformer blocks have the attention heads, dropout, layer norm, and the feed-forward network
— Pretraining is a standard training process used for deep learning models. But in this case, we update the weights end-to-end, from the embeddings to the attention layer to the feedforward network

The implementation was highly inspired by the Language Modeling from Scratch course [3] and the Build a Large Language Model book [4]. It's still very rudimentary, but very useful if you plan to learn these concepts in depth.

🔗 Article Link: Self-Attention, Foundation Models, and the GPT Architecture from Scratch: https://t.co/R69Heqle3P

---

In the future, I plan to write about finetuning (using foundation models and finetuning for other tasks) and optimizations (attention blocks optimization, GPU and kernel optimization).

[0] Foundation Models at Nubank: https://t.co/xLRWLPOl3o
[1] LLM implementation repo: https://t.co/STPyVTKlM4
[2] Tokenizers lecture: https://t.co/YtBodQ2SEW
[3] Language Modeling from Scratch: https://t.co/TMGTAXJmqS
[4] Build a Large Language Model: https://t.co/TawYd8Zi8M

tkptbr retweeted

TK • 木下 @wordsofteekay

17 days ago

I've just read the "Let Me Convince You to Be Prolific" post about the benefits of being prolific, especially for creative people in the digital age. The idea is that we should create and release more experiments, creating this long tail of acceptable work: — Experiment > Failure > Refine > Loop — Publishing work helps people find you — Early drafts, faster feedback loop > faster improvement — Each experiment contributes to the following one I noticed this about my blog, where I've been writing for +10 years now. All the technical blogs I wrote helped improve the next one. Any of them is perfect, but I can see how much progress I have made over time. The things you learn, the feedback you get, and the will to refine your work lead to mastery. And the long tail of work starts to compound and help discover you. There are these two quotes I liked: > "Giving up on perfectionism doesn’t mean that you will not produce anything perfect, but rather that perfection will happen from time to time because of the sheer mass of output." — Dean Keith Simonton > "If you can write one short story a week — it doesn’t matter what the quality is to start, but at least you’re practicing, and at the end of the year you have 52 short stories, and I defy you to write 52 bad ones." — Ray Bradbury I found this blog in @noghartt's bookmarks. There's an awesome curation there. → Blog: https://t.co/LBI4yNF1d8

183

191

Who to follow

zanfranceschi

@zanfranceschi

introvertido // autista // engenheiro de software // ciclo circadiano muito zoado

GambiConf

@gambiconf

✍️ CFP aberto: https://t.co/pT2TNWADMm | 🗓️ 28 e 29 de Novembro, em São Paulo

henrique

@hschmaiske

CTO @ Galaxy (Meteor Software) | Full-Stack Engineer & Platform Builder | TypeScript • Node.js • React • Kubernetes

tkptbr retweeted

TK • 木下 @wordsofteekay

10 days ago

✨ I worked on this article the whole day and made a lot of progress. I'm almost there. A lot of work, with many experiments, but it's getting traction. "Make Something Wonderful" inspired me to keep building and sharing.

tkptbr retweeted

TK • 木下 @wordsofteekay

18 days ago

I've just found out about this course on Foundation Models and Generative AI. Quite interesting lectures. I plan to watch the lectures as soon as I finish the Language Modeling from Scratch course. So many interesting things to learn.

wordsofteekay's tweet photo. I've just found out about this course on Foundation Models and Generative AI. Quite interesting lectures. I plan to watch the lectures as soon as I finish the Language Modeling from Scratch course. So many interesting things to learn. https://t.co/ZeuuoQMR2a

160

175

tkptbr retweeted

TK • 木下 @wordsofteekay

20 days ago

Many people have already pointed out, but this course by Stanford is remarkable. It's been part of the first hour of my morning. Watching the lecture, taking notes, spawns new tabs with different papers mentioned, and coding to build the intuition behind each lecture. Mixture of experts was a nice lecture, but the one I liked the most so far was about PyTorch and resource accounting and how to make sense of CPU/GPU, memory, runtime/compute (FLOPs), etc., from first principles. 🔗 link: https://t.co/TMGTAXIOBk

wordsofteekay's tweet photo. Many people have already pointed out, but this course by Stanford is remarkable. It's been part of the first hour of my morning. Watching the lecture, taking notes, spawns new tabs with different papers mentioned, and coding to build the intuition behind each lecture.

Mixture of experts was a nice lecture, but the one I liked the most so far was about PyTorch and resource accounting and how to make sense of CPU/GPU, memory, runtime/compute (FLOPs), etc., from first principles.

🔗 link: https://t.co/TMGTAXIOBk

153

173

tkptbr retweeted

TK • 木下 @wordsofteekay

23 days ago

[Paper Reading: Your Spending Needs Attention] I've just finished reading the "Your Spending Needs Attention" paper by Nubank, and not only are the results impressive, but the ML and engineering approach is also very interesting. It shows the power of self-supervised representation learning to automatically understand user behavior from raw (transaction) data, which made me think about how many insightful representations we are missing by not using it, and why (engineering and money trade-offs come to mind). Here's the research breakdown: causal self-attention + tabular feature embedding + fine-tuning for RecSys. Transformer-based model: > Text is All You Need: Individual transactions are tokenized, concatenated into a transaction string, and fed through a Transformer [0] to produce a transaction sequence embedding. > No Positional Embeddings (NoPE) [1]: drop the temporal information > FlashAttention [2] + NoPE = Efficient Long Contexts (transaction = ~14 tokens — the sequence gets large very fast): the model can train on much larger context lengths Tabular Features: > Feature embeddings for numerical and categorical variables > LightGBM: gradient-boosted tabular modeling > Deep Cross Network V2 (DCNv2) [3]: learn feature interactions Fine-Tuning — classification task for RecSys: > Low-Rank Adaptation (LoRA) [4]: injecting trainable low-rank matrices into attention layers to handle the "overfitting and catastrophic forgetting" issues. > Late Fusion: freeze the transformer embeddings and use them as static features passed into LightGBM or DCNv2 independently. > Joint Fusion (nuFormer): keep the transformer embeddings trainable end-to-end alongside the tabular features. It's very insightful how joint fusion trains the entire system end-to-end using a DNN, so gradients can flow through the embeddings compared to GBT. Other insightful ideas from the paper: > Context window problem: adding more data sources (e.g. financial products) can lead to worse results because each data source will "compete" for the available tokens for a fixed context window. > Scaling laws: larger model size, context lengths, and data volume lead to improved performance. There are still many interesting avenues they will explore, especially scaling laws and scaling the application to other products. It was also insightful how they are not just following the state of the art, but doing research to find new ideas [5]. --- Paper: https://t.co/QJYpVN6NBD --- [0] https://t.co/VNdFcLByqi [1] https://t.co/xZ4C4eBVhp [2] https://t.co/gR1GWBelnO [3] https://t.co/TCT2b0633O [4] https://t.co/jeZHOn9EgR [5] https://t.co/CAWJePsYXQ

wordsofteekay's tweet photo. [Paper Reading: Your Spending Needs Attention]

I've just finished reading the "Your Spending Needs Attention" paper by Nubank, and not only are the results impressive, but the ML and engineering approach is also very interesting. It shows the power of self-supervised representation learning to automatically understand user behavior from raw (transaction) data, which made me think about how many insightful representations we are missing by not using it, and why (engineering and money trade-offs come to mind).

Here's the research breakdown: causal self-attention + tabular feature embedding + fine-tuning for RecSys.

Transformer-based model:
> Text is All You Need: Individual transactions are tokenized, concatenated into a transaction string, and fed through a Transformer [0] to produce a transaction sequence embedding.
> No Positional Embeddings (NoPE) [1]: drop the temporal information
> FlashAttention [2] + NoPE = Efficient Long Contexts (transaction = ~14 tokens — the sequence gets large very fast): the model can train on much larger context lengths

Tabular Features:
> Feature embeddings for numerical and categorical variables
> LightGBM: gradient-boosted tabular modeling
> Deep Cross Network V2 (DCNv2) [3]: learn feature interactions

Fine-Tuning — classification task for RecSys:
> Low-Rank Adaptation (LoRA) [4]: injecting trainable low-rank matrices into attention layers to handle the "overfitting and catastrophic forgetting" issues.
> Late Fusion: freeze the transformer embeddings and use them as static features passed into LightGBM or DCNv2 independently.
> Joint Fusion (nuFormer): keep the transformer embeddings trainable end-to-end alongside the tabular features.

It's very insightful how joint fusion trains the entire system end-to-end using a DNN, so gradients can flow through the embeddings compared to GBT.

Other insightful ideas from the paper:
> Context window problem: adding more data sources (e.g. financial products) can lead to worse results because each data source will "compete" for the available tokens for a fixed context window.
> Scaling laws: larger model size, context lengths, and data volume lead to improved performance.

There are still many interesting avenues they will explore, especially scaling laws and scaling the application to other products. It was also insightful how they are not just following the state of the art, but doing research to find new ideas [5].

---
Paper: https://t.co/QJYpVN6NBD

---
[0] https://t.co/VNdFcLByqi
[1] https://t.co/xZ4C4eBVhp
[2] https://t.co/gR1GWBelnO
[3] https://t.co/TCT2b0633O
[4] https://t.co/jeZHOn9EgR
[5] https://t.co/CAWJePsYXQ

tkptbr retweeted

TK • 木下 @wordsofteekay

24 days ago

[ML Grind] Finished: > Foundation Models: finished transformer-based model implementation from scratch + finetuning > Finished reading the Attention-based model in the industry paper: interesting insights about context length, scaling laws, and joint fusion Have been working on: > ML monitoring + alerting system for ML models > AI agent for business flow: interesting engineering learnings (agent/prompt refinements <> MCP <> backend + infra) > Real estate liquidity model: interesting learnings about temporal splits, model calibration, model optimization, and dataset exploration Plan for today: > Continue writing the blog post about the foundation model implementation > Continue the "Language Modeling from Scratch" course by Stanford > Read a new ML paper

wordsofteekay's tweet photo. [ML Grind]

Finished:
> Foundation Models: finished transformer-based model implementation from scratch + finetuning
> Finished reading the Attention-based model in the industry paper: interesting insights about context length, scaling laws, and joint fusion

Have been working on:
> ML monitoring + alerting system for ML models
> AI agent for business flow: interesting engineering learnings (agent/prompt refinements <> MCP <> backend + infra)
> Real estate liquidity model: interesting learnings about temporal splits, model calibration, model optimization, and dataset exploration

Plan for today:
> Continue writing the blog post about the foundation model implementation
> Continue the "Language Modeling from Scratch" course by Stanford
> Read a new ML paper

216

126

10K

tkptbr retweeted

TK • 木下 @wordsofteekay

24 days ago

As long as I can remember, I have always had this desire to do great things. Not only making something wonderful, but striving to become great. Yet another day, I wake up with these thoughts. Let's refine my skills, work on my projects, and go one step further in this infinity game of life.

802

tkptbr retweeted

TK • 木下 @wordsofteekay

about 1 month ago

[ML Grind] Yesterday I took the day to work on the model training of the GPT-like model. I built the tokenization/embedding layers, the multi-head attention mechanism, added the transformer blocks to the GPTModel, and trained it on input text of 5k tokens (not big but useful for learning purposes).

wordsofteekay's tweet photo. [ML Grind]

Yesterday I took the day to work on the model training of the GPT-like model. I built the tokenization/embedding layers, the multi-head attention mechanism, added the transformer blocks to the GPTModel, and trained it on input text of 5k tokens (not big but useful for learning purposes).

111

tkptbr retweeted

TK • 木下 @wordsofteekay

about 1 month ago

Continuing my ML progress > LLM from scratch: worked on this all day (built a self-attention and multi-head attention mechanism) > Finished the monitoring system this week > AI Engineering: continue the book — I'm currently working on an AI agent product and I need to learn more about this one > Got a mentor at work: he shared many papers and resources I should read (tons of work to do!) > ML Bootcamp: working on the first project with my pair — first part (EDA) is done. Now I need to move to the model training phase

126

10K

tkptbr retweeted

TK • 木下 @wordsofteekay

about 2 months ago

It's been almost 2 months since I started working on ML, and it's been one of the best decisions of my career. The learning curve, the knowledge gap, the interesting projects, and the people I'm working with are all exciting. I'm having so much fun at and outside work. The cherry on top is the ML/AI bootcamp provided by my company. They built a bootcamp based on ML theory and hands-on projects, and we need to study and deliver the exercises and projects. It's an intensive 3-month bootcamp on traditional ML and AI-agents. I keep following my curiosity and opportunities for growth. So much to learn.

289

162

19K

tkptbr retweeted

TK • 木下 @wordsofteekay

about 2 months ago

[ML Grind] Today's study session. > Building a LLM from scratch > RL course + book > Decoding Alphafold + ML research > Finishing the feature store and monitoring system implementation So much to learn.

wordsofteekay's tweet photo. [ML Grind] Today's study session.

> Building a LLM from scratch
> RL course + book
> Decoding Alphafold + ML research
> Finishing the feature store and monitoring system implementation

So much to learn.

290

117

14K

tkptbr retweeted

TK • 木下 @wordsofteekay

about 2 months ago

[ML Grind] Goals for Today > Continue studying alphafold 2 and 3 > Finish the first coding assignment for the Language Modeling from Scratch course > Continue designing the ML monitoring system for my model > Continue RL course > Read alphafold cases: starting with IsoLabs --- Besides the ML grind, I still need to run my 5k, clean the house, and do meal prep for the week. Let's go!

wordsofteekay's tweet photo. [ML Grind] Goals for Today

> Continue studying alphafold 2 and 3
> Finish the first coding assignment for the Language Modeling from Scratch course
> Continue designing the ML monitoring system for my model
> Continue RL course
> Read alphafold cases: starting with IsoLabs

---
Besides the ML grind, I still need to run my 5k, clean the house, and do meal prep for the week. Let's go!

312

107

tkptbr retweeted

TK • 木下 @wordsofteekay

2 months ago

The Infinity Machine will definitely be the next book I want to read. It's a book about Demis, DeepMind, and their work on AI. If you got curious about it, you should give this Founders podcast a try: https://t.co/W8mngJrGp8 I like this podcast in general, but this one about Demis and how he works is fascinating. The passage I liked the most was about his determination and being mission-driven 24/7: "There is no 50 percent mode in Demis. There is no 99 percent mode in Demis. There is only 100 percent." As Kpaxs said, "Some people are playing a completely different game, 24/7. No off switch".

tkptbr retweeted

TK • 木下 @wordsofteekay

2 months ago

Last week, I read this very insightful blog post by @ZyWang25 titled "How I become a Research Engineer at Google DeepMind". It's not only an inspiring and amazing accomplishment, but it resonates with us who are following our curiosity, looking for this inner motivation (or passion, as other people say), improving our craft, and reaching our purpose. Before I have the chance to write and share my own post about my experience, read this piece to feel inspired and motivated to keep pushing and grinding. Here are the topics that resonated with me: > Find your 'why' > Upskill relentlessly. Do the work! > Productivity = Progress: move closer to your goal > Create your opportunities. Manufacture luck by working hard on your craft and being strategic about your goals

348

370

21K

tkptbr retweeted

TK • 木下 @wordsofteekay

2 months ago

[ML Grind] Focusing on foundation work: > Deep Learning/LLM/ML foundation studies > Bio x AI research: unwrapping AlphaFold > Finished the Machine Learning System Design book Documenting everything in my physical notebook and the ML research repo: https://t.co/yKNjawriYI

wordsofteekay's tweet photo. [ML Grind]

Focusing on foundation work:
> Deep Learning/LLM/ML foundation studies
> Bio x AI research: unwrapping AlphaFold
> Finished the Machine Learning System Design book

Documenting everything in my physical notebook and the ML research repo: https://t.co/yKNjawriYI

288

175

tkptbr retweeted

TK • 木下 @wordsofteekay

2 months ago

2 years ago, I started learning ML for fun, and then, after learning more about Hamming's ideas, I decided to take it seriously to accomplish my life's big goals. I'm still in the process, but starting to get the rewards and making progress. → post my ML learning experience: https://t.co/qL4R1olj41 → post about my learning roadmap: https://t.co/7oLBZJN1RI There is so much to learn, still.

wordsofteekay's tweet photo. 2 years ago, I started learning ML for fun, and then, after learning more about Hamming's ideas, I decided to take it seriously to accomplish my life's big goals.

I'm still in the process, but starting to get the rewards and making progress.

→ post my ML learning experience: https://t.co/qL4R1olj41
→ post about my learning roadmap: https://t.co/7oLBZJN1RI

There is so much to learn, still.

tkptbr retweeted

TK • 木下 @wordsofteekay

2 months ago

So many books, so little time. Besides The Art of Doing Science and Engineering, I'm excited to read Sutskever's List and The Infinity Machine, a book released a couple of days ago. Time to remove all distractions and focus.

315

224

21K

tkptbr retweeted

TK • 木下 @wordsofteekay

2 months ago

📚 Started a new book today. I'm on the first few pages, and the way it was written already caught my attention. "Teachers should prepare the student for the student's future, not for the teacher's past. Most teachers rarely discuss the important topic of the future of their field, and when this is pointed out, they usually reply: 'No one can know the future'. It seems to me the difficulty of knowing the future does not absolve the teacher from seriously trying to help the student to be ready for it when it comes." Excited to be educated on styles of learning and thinking, and then get back to training, applying those principles.

wordsofteekay's tweet photo. 📚 Started a new book today.

I'm on the first few pages, and the way it was written already caught my attention.

"Teachers should prepare the student for the student's future, not for the teacher's past. Most teachers rarely discuss the important topic of the future of their field, and when this is pointed out, they usually reply: 'No one can know the future'. It seems to me the difficulty of knowing the future does not absolve the teacher from seriously trying to help the student to be ready for it when it comes."

Excited to be educated on styles of learning and thinking, and then get back to training, applying those principles.

274

168

27K

𝕋𝕜

@tkptbr

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users