@TVMohandasPai its true its expensive now for some advanced models but it is because of sole ignorance from indian it companies...indian it companies need to fire many old age leaders...
To train a GPT class 1T model from scratch - including failed runs, data acq+clean+rlhf, post-training, team/people will likely req $250M of compute on an aggressive 3-4mo schedule (i.e. more reserved GPUs), $500-600M all-in IF you do a dense one. MoE + fp8 will cut costs by 1/10th depending on how many active params you have. If you want SOTA however, the budgets go significantly higher on test-time compute, post-training RL, and data/synthetic generations..and v. high on talent. Maybe $2-4B all-in. After that comes serving the model. The talent is key to get to SOTA/beat it - and then you have to ensure this is useful enough to have inference vol over time - for which the capital will come if there is usage / TAM. So this is not as much about raising $50-60B, or raising it all at once as the OP says - we are investors in mistral, sarvam, reflection and anthropic - and they all scaled capital over time as models got adoption, but the early bottleneck is more on talent + GPUs at that scale where you can do interesting things.
@TVMohandasPai@RajivMessage if some people, whom you are referring to, built things doesn't mean they are always right and can't be pointed out...such mentality only degrades the future
To train a GPT class 1T model from scratch - including failed runs, data acq+clean+rlhf, post-training, team/people will likely req $250M of compute on an aggressive 3-4mo schedule (i.e. more reserved GPUs), $500-600M all-in IF you do a dense one. MoE + fp8 will cut costs by 1/10th depending on how many active params you have. If you want SOTA however, the budgets go significantly higher on test-time compute, post-training RL, and data/synthetic generations..and v. high on talent. Maybe $2-4B all-in. After that comes serving the model. The talent is key to get to SOTA/beat it - and then you have to ensure this is useful enough to have inference vol over time - for which the capital will come if there is usage / TAM. So this is not as much about raising $50-60B, or raising it all at once as the OP says - we are investors in mistral, sarvam, reflection and anthropic - and they all scaled capital over time as models got adoption, but the early bottleneck is more on talent + GPUs at that scale where you can do interesting things.
All Paid Courses (Free for First 4500 People)
𝗣𝗮𝗶𝗱 𝗖𝗼𝘂𝗿𝘀𝗲 𝗙𝗥𝗘𝗘 (PART - 2)
1. Artificial Intelligence
2. Machine Learning
3. Prompt Engineering
4. Claude,Chatgpt,Grok
5. Data Analytics
6. AWS Certified
7. Data Science
8. BIG DATA
9. Python
10. Ethical Hacking
(72 Hours only )
Like + RT + comment ' Drive '
Must Follow me so I can DM you.
@TVMohandasPai building the most advanced ai model requires talent retention, financial planning, and most importantly a conviction, belief, and a hunger of its need...none of this exist in indian IT people.
@TVMohandasPai really? govt should exclude people from it field including you and nandan especially for sure. bring some indians back here with hefty pay. people from it like you who expect talent to work at 4 lpa and build foundational models. stupidity is at its peak.
@TVMohandasPai and appointing nandan would be catastrophic for indian ai revolution he is not fit for that role and neither people from indian it services giants. @TVMohandasPai
I'm a technology optimist. I’ve spent four decades studying disruptive innovation, from the microprocessor, the internet, mobile phones to OpenAI. I'm certain AI will do 80% of the economically valuable work humans do today, for 80% of all jobs, faster than most believe. The question isn't whether mass underemployment arrives, but whether we have a policy framework ready. Right now we don't.