Harish @HarishBoke - Twitter Profile

Harish @HarishBoke

4 months ago

“In every disruption, the fearful protect the past. The leaders build the future.” #thoughtoftheday

0

27

HarishBoke retweeted

Andrej Karpathy

@karpathy

9 months ago

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

karpathy's tweet photo. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI.

It weighs ~8,000 lines of imo quite clean code to:

- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with "GRPO"
- Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI.
- Write a single markdown report card, summarizing and gamifying the whole thing.

Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.

My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.

Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

681

24K

3K

18K

6M

HarishBoke retweeted

ADG PI - INDIAN ARMY

@adgpi

about 1 year ago

#PahalgamTerrorAttack Justice is Served. Jai Hind!

22K

438K

104K

10K

23M

Harish @HarishBoke

about 1 year ago

Darkness lies ahead, but this time, it leads to light—one so bright, it asks me to close my eyes and trust the journey. #Setoo #SetooAI #reflecting

0

2

0

71

Who to follow

The whole is greater than the sum of its parts | Building teams, services and products - not necessarily in that order.

Vikash Mishra

@MishraVikash

Harish @HarishBoke

over 1 year ago

Ignite your creativity with Setoo AI on Call! 💡 From brainstorming to brilliant execution, SetooAI fuels your imagination and provides the assistance you need to bring your projects to life. #AICreativity #SetooAI #Harishboke #AIexpert #AISolution # AIcreater #SmartThinking

0

1

0

56

Harish @HarishBoke

over 1 year ago

🚀 Thrilled to launch our AI Call System Bigger | Better | Beyond! 🤖✨ Highlights: ✔️ Easy scheduling ✔️ Real-time reviews ✔️ Personalized AI experiences Missed the demo? Visit 👉 https://t.co/7g70EJpsSm | https://t.co/4VQCei8D7t #Setoo #AI #TechSolutions #harishboke

0

38

Harish @HarishBoke

over 1 year ago

Thought of the day ! The best teams didn’t just report status — They shares wins, flag blockers, and aligns on solutions. Let’s be the best one! ☝️

0

1

0

40

Harish @HarishBoke

almost 2 years ago

No Clarity 🌫️ No Process 📝 hence no progress 📈 #MondayMotivation #leadership #setoo #QuoteForTheDay #visionDrivenLeadership

0

1

0

65

Harish @HarishBoke

almost 2 years ago

@ashwinphatak Congratulations 🎉

0

1

0

21

Harish @HarishBoke

about 2 years ago

Check out my latest article: Unlock the Power of Conversational AI for your Business! https://t.co/FQnmE1IvFE #setoo #ai #ml

0

1

0

41

Harish @HarishBoke

about 2 years ago

Pain of Growth ↗️ ↙️Suffering of comfort zone #sleeplessNight #pace #businessGrowth #decisionMaking

0

2

0

31

Harish @HarishBoke

about 2 years ago

What sis stronger than individuals ? - Processes #edgeOfKnowledge Strong Processes and risk management help you achieve your goals steadily and sustainably https://t.co/I7H7fmS07p

0

2

0

19

Harish @HarishBoke

over 3 years ago

@matejlatin RHYTHM IN WEB TYPOGRAPHY. It's great content, I am looking for help to adopt the same for my website. Any leads or help you can point me to ? Appreciate the help! Thank you

0

Harish @HarishBoke

almost 4 years ago

@sketchUcation Trying to purchase plugin after 30 days trail. Payment mode only shows paypal and it sucks! not working after tryign multiple times. May be it has stopped support in Indian payment system. Any other way @SketchucationA ? #blocked @SketchUp @lumion3d

0