Vivek Phalak

@vphalak

Entrepreneur | CTO/ Co-founder Shoptimize | Data Engineering, AI | ex-Oracle | CMU Alum

Pune, India

Joined January 2009

631 Following

63 Followers

69 Posts

vphalak retweeted

Jason Wei

@_jasonwei

over 1 year ago

An underrated but occasionally make-or-break skill in AI research (that didn’t really exist ten years ago) is the ability to find a dataset that actually exercises a new method you are working on. Back in the day when the bottleneck in AI was learning, many methods were dataset-agnostic; for example, a better optimizer would be expected to improve on both ImageNet and CIFAR-10. Nowadays language models are so multi-task that the answer to whether something works is almost always “it depends on the dataset”. A common example of this is the question, “on what datasets does chain of thought improve performance?” A recent paper even argued (will link below) that CoT mainly helps on math/logic, and I think that is both a failure of imagination and a lack of diverse evals. Naively you might try CoT models on 100 random user chat prompts and not see much difference, but this is because the prompts were already solvable without CoT. In fact there is a small and very important slice of data where CoT makes a big difference—the obvious examples are math and coding, but include almost any task with asymmetry of verification. For example, generating a poem that fits a list of constraints is hard on the first try but much easier if you can draft and revise using CoT. As another made-up example, let’s say you want to know if browsing improves performance on geology exams. Maybe using browsing on some random geology dataset didn’t improve performance. The important thing to do here would be to see if the without-browsing model was actually suffering due to lack of world knowledge—if it wasn’t, then this was the wrong dataset to try browsing on. In other words you should hesitate to draw a conclusion like “X method doesn’t work” without ensuring that the dataset used for testing actually exercises that method. The inertia from five years ago is to take existing benchmarks and try to solve them, but nowadays there is a lot more flexibility and sometimes it even makes sense to create a custom dataset to showcase the initial usefulness of an idea. Obviously the danger with doing this is that a contrived dataset may not represent a substantial portion of user queries. But if the method is in principle general I think this is a good way to start and something people should do more often.

691

445

83K

vphalak retweeted

Shreya Shankar

@sh_reya

almost 2 years ago

yep people don't realize the significant mlops effort required with custom fine-tuned models & thus accrue technical debt. data preparation for fine-tuning is way harder than the fine-tuning itself, & you have to be committed to regularly repreparing and retraining on new data

16K

vphalak retweeted

Prasann Pandya

@prasann_pandya

almost 2 years ago

Calling it now: After coming back to SF, it's clear to me we are in an AI bubble. Too many AI founders are building picks and shovels just because VCs want them to. Very few founders building from personal problems and passion, most just building because that's what everyone else is doing. Very few actually solving real problems.

109

126

566

515K

vphalak retweeted

Ajit Pawar @misfit_brain

about 2 years ago

Indeed it was a great event..Thanks to Ayush Jain and mindbowser to hosting it and @vphalak and Devendra Laulkar for sharing insight on the GenAI and making a worthy Saturday morning 👏 Adding some more pics here 🙌

misfit_brain's tweet photo. Indeed it was a great event..Thanks to Ayush Jain and mindbowser to hosting it and @vphalak and Devendra Laulkar for sharing insight on the GenAI and making a worthy Saturday morning 👏
Adding some more pics here 🙌 https://t.co/oj53yJ6Nyt

116

Who to follow

Product Manager and start-ups. All sports enthusiast. Cricket and Football player. F1 is the latest addition. Thinking about a lot of things at the same time.

Andromeda

@Androme47387300

I can drink water without chewing

vphalak retweeted

Hamel Husain

@HamelHusain

about 2 years ago

Below is an explanation of "Agentic" workflows in ~ 15 LOC. At its core, "Agentic" just means LLMs that can call functions. Another ex of how unnecessary jargon confuses people https://t.co/w0YYsxoyLT

HamelHusain's tweet photo. Below is an explanation of "Agentic" workflows in ~ 15 LOC. At its core, "Agentic" just means LLMs that can call functions.

Another ex of how unnecessary jargon confuses people

https://t.co/w0YYsxoyLT https://t.co/grydeSebTD

530

453

50K

Vivek Phalak @vphalak

about 2 years ago

How long till our AWS contract will be sung to us 😁

Riley Goodside

@goodside

about 2 years ago

AI-generated sad girl with piano performs the text of the MIT License

236

876K

Vivek Phalak @vphalak

over 2 years ago

@gokulr This applies to technical roles as well. The difference is that you can't hire good enough generalists out of college. However, if they get an effective mentor and learn fast, they can get good at several skills before specializing. How many founders can/want to?

299

Vivek Phalak @vphalak

over 2 years ago

@johnrushx Great points. Scrum is a scam. Yes. More people need to understand this. It isn't bad in theory, but that's just in theory.

Vivek Phalak @vphalak

over 2 years ago

@gwenshap Back in the day, you were forced to learn these concepts as well as low level OS. Databases were nowhere as self managed as today. Recovering a failed Oracle DB or performance tuning meant popping up the hood every single time.

vphalak retweeted

Bindu Reddy

@bindureddy

over 2 years ago

The Delicate Dance of Large Language Models - Memorization vs. Generalization Critics of LLMs claim that they are stochastic parrots that do not do much more than memorize the world's information. Others, however, point to examples of LLMs displaying logic and reasoning abilities and claim they are beginning to exhibit emergent properties and may evolve to be a superintelligence that threats our very existence It turns out that LLMs are actually pretty good at memorization. Data Imprinting: During the training phase, a language model essentially encodes specific data points into the weights of the neural network. This is akin to how you remember the capital of France is Paris. N-gram Storage: Essentially, a trained model will have "memorized" a lot of sequences of words (N-grams), but this isn't memory in the way humans understand it. It's a complex mathematical representation that allows LLMs to predict the next word in a sentence based on the preceding words. This means that LLMs are extremely good at retrieving pretty much any data that they have been trained on and presenting it in a coherent way. In other words, LLMs are capable of memorizing without overfitting This is also why GPT-4 is exceptionally good at answering questions on information BEFORE its training cut-off date of September 2021 and doesn't do well on information after that cut-off date. In addition, LLMs don't have the capability of learning on the fly - Any information that you send it in real-time isn't memorized making them far less capable than the human brain Generalization: The real magic happens when LLMs can take what they've learned and apply it to new situations. Fundamentally, generalization happens because language models map linguistic constructs into a high-dimensional vector space, wherein each dimension could hypothetically represent a unique linguistic feature—be it syntactical, semantic, or otherwise. Predictive capabilities in this space enable generalization to previously unseen data. Each sentence or word is represented in this mathematical space with similar sentences like "How is the weather" or "Tell me more about the climate" being close together. This space is what enables LLMs to "understand" the language. When a new sentence is introduced, LLMs can find its place in this semantic universe based on its features. This helps it generate responses that are contextually appropriate, even if the model has never seen that exact sentence during training. The vector space also allows for contextual understanding - for example, the word "bank" has totally different meanings when used near a river vs. when used in the context of money Transfer Learning and Domain Adaptability: LLMs can be fine-tuned on specialized datasets and can adapt to the new domain - this is a form of generalized learning and points to the predictive power of these models. There is overwhelming evidence that LLMs like GPT-4 have remarkable generalization capabilities as well. The combination of memorization and generalization is what makes it so effective. LLMs are in the Goldilocks zone where a model can both memorize and generalize effectively. Too much memorization can lead to overfitting, where the model is too tailored to the training data and sucks at handling new info. Newer LLMs will evolve to memorize and generalize more and will be capable of complex reasoning tasks much like humans do.

bindureddy's tweet photo. The Delicate Dance of Large Language Models - Memorization vs. Generalization

Critics of LLMs claim that they are stochastic parrots that do not do much more than memorize the world's information. Others, however, point to examples of LLMs displaying logic and reasoning abilities and claim they are beginning to exhibit emergent properties and may evolve to be a superintelligence that threats our very existence

It turns out that LLMs are actually pretty good at memorization.

Data Imprinting: During the training phase, a language model essentially encodes specific data points into the weights of the neural network. This is akin to how you remember the capital of France is Paris.

N-gram Storage: Essentially, a trained model will have "memorized" a lot of sequences of words (N-grams), but this isn't memory in the way humans understand it. It's a complex mathematical representation that allows LLMs to predict the next word in a sentence based on the preceding words.

This means that LLMs are extremely good at retrieving pretty much any data that they have been trained on and presenting it in a coherent way. In other words, LLMs are capable of memorizing without overfitting

This is also why GPT-4 is exceptionally good at answering questions on information BEFORE its training cut-off date of September 2021 and doesn't do well on information after that cut-off date.

In addition, LLMs don't have the capability of learning on the fly - Any information that you send it in real-time isn't memorized making them far less capable than the human brain

Generalization: The real magic happens when LLMs can take what they've learned and apply it to new situations.

Fundamentally, generalization happens because language models map linguistic constructs into a high-dimensional vector space, wherein each dimension could hypothetically represent a unique linguistic feature—be it syntactical, semantic, or otherwise. Predictive capabilities in this space enable generalization to previously unseen data.

Each sentence or word is represented in this mathematical space with similar sentences like "How is the weather" or "Tell me more about the climate" being close together. This space is what enables LLMs to "understand" the language.

When a new sentence is introduced, LLMs can find its place in this semantic universe based on its features. This helps it generate responses that are contextually appropriate, even if the model has never seen that exact sentence during training.

The vector space also allows for contextual understanding - for example, the word "bank" has totally different meanings when used near a river vs. when used in the context of money

Transfer Learning and Domain Adaptability: LLMs can be fine-tuned on specialized datasets and can adapt to the new domain - this is a form of generalized learning and points to the predictive power of these models.

There is overwhelming evidence that LLMs like GPT-4 have remarkable generalization capabilities as well. The combination of memorization and generalization is what makes it so effective.

LLMs are in the Goldilocks zone where a model can both memorize and generalize effectively. Too much memorization can lead to overfitting, where the model is too tailored to the training data and sucks at handling new info. Newer LLMs will evolve to memorize and generalize more and will be capable of complex reasoning tasks much like humans do.

836

196

706

210K

vphalak retweeted

Jim O'Shaughnessy

@jposhaughnessy

over 2 years ago

Tao Te Ching, Verse 17 "When the Master governs, the people are hardly aware that he exists. Next best is a leader who is loved. Next, one who is feared. The worst is one who is despised. If you don't trust the people, you make them untrustworthy. The Master doesn't talk, he acts. When his work is done, the people say, "Amazing: we did it, all by ourselves!"" ~Lao Tzu

277

119

118K

Vivek Phalak @vphalak

almost 3 years ago

Exciting times for open source model tuners

Paul Graham

@paulg

almost 3 years ago

Phind finds fine-tuned CodeLlama-34B beats GPT-4. https://t.co/oqCF9ZmZEG

157

509

555K

vphalak retweeted

Sanyam Bhutani

@bhutanisanyam1

almost 3 years ago

CS 25 has a great roadmap of LLM papers! 🙏 Transformers United has great guest lectures spanning the foundations of Large Language Models An underrated aspect of the course is the curated list of papers on every topic Perfect for your weekend reads: https://t.co/WNbLfzf3Pf

bhutanisanyam1's tweet photo. CS 25 has a great roadmap of LLM papers! 🙏

Transformers United has great guest lectures spanning the foundations of Large Language Models

An underrated aspect of the course is the curated list of papers on every topic

Perfect for your weekend reads:

https://t.co/WNbLfzf3Pf https://t.co/iOBGPFqtqh

525

116

451

68K

vphalak retweeted

Shreyas Doshi

@shreyas

almost 3 years ago

9 tips for misery in work & life: 1) See self as a victim 2) Complain constantly 3) Resent successful people 4) Compare habitually 5) Nitpick to feel superior 6) Assume bad intent 7) Refuse to listen to others 8) Aim to impress everyone 9) Learn only by making mistakes

665

326

74K

vphalak retweeted

Mark Tenenholtz

@marktenenholtz

almost 3 years ago

Anyone in AI who is ignoring tabular data is ignoring 90%+ of the economy and 95% of the people who would most benefit from AI.

846

282

247K

Vivek Phalak @vphalak

about 3 years ago

@sathyarg Love this interpretation. And the point of self inquiry would be to understand ones true nature. And not get attached too much to the code you write, the ppts you create, your diagnosis,... :-) Tech nor time has anything to do with it.

vphalak retweeted

pcawdron.bsky.social & http://aus.social/@pcawdron @PeterCawdron

over 3 years ago

This is the only real time travel paradox

233

51K

10K

vphalak retweeted

graas @graas_ai

over 3 years ago

We're on the lookout for a Data Engineer with 2+ years of experience, and a passion to troubleshoot issues across various data sources and platforms! Apply here -https://t.co/wGsfDV4pFj or share the resumes at [email protected] #eCommerce #Growth #Hiring #DataEngineer

graas_ai's tweet photo. We're on the lookout for a Data Engineer with 2+ years of experience, and a passion to troubleshoot issues across various data sources and platforms! Apply here -https://t.co/wGsfDV4pFj or share the resumes at careers@graas.ai

#eCommerce #Growth #Hiring #DataEngineer https://t.co/O7zh0h1tyo

vphalak retweeted

Mark Tenenholtz

@marktenenholtz

over 3 years ago

A model is NOT just a LightGBM/XGBoost/NN/etc. A model is the algorithm *plus*: • The parameters • The data preprocessing • The engineered features Good deployment pipelines make these inextricable and/or easily reproducible.

172

Vivek Phalak

@vphalak

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users