Now, with voice technology and and AI for grammar correction, I find I can produce 3-4x more content without worrying about minor language issues โ between 2,000 to 3,000 words! ๐This has allowed me to express my ideas more fully and creatively.
I built a voice writer tool to help you write things quickly. โก๏ธ It uses AI for speech recognition and grammar correction. I have been using it for my book reviews, emails, Slack messages, and more. Here is a demo video. ๐
Try it out here: https://t.co/wJLi4wmTR9
In this video, I cover the top 10 most cited papers in the history of natural language processing, ranked by number of Google Scholar citations. ๐ We cover milestones like the Transformer model, RNN, word vectors, and even go back to the roots with WordNet!
Challenges we faced:
- Teochew is related to Mandarin, a high-resource language, but how do we apply transfer learning?
- With zero resources for training, we had to build our dataset from scratch. ๐ ๏ธ
- Teochew doesn't even have a writing system! How do we model that? ๐ค
๐ฅ New Video! In this video, we train a speech recognition model (using OpenAI's Whisper) to recognize our family's Chinese dialect, Teochew, or Chaozhou dialect (ๆฝฎๅท่ฏ). It has about 10 million speakers and is a part of the Min Nan language family.
https://t.co/0BIpHngtAA
More seriously - we'll use the RAG pattern, indexing HuggingFace metadata, integrating OpenAI embeddings with pgvector and chat models. I'll also explain some tips on how to rerank the chatbot's suggestions, deploy the project efficiently, and more.
๐น New Video! Ever had trouble deciding which AI to use for your projects? Let's solve that with AI. In this video, I will build an AI to find the best AI for you ๐คฏ
https://t.co/YniD6lUbrQ
@osanseviero I've made a video about the KV cache, and I've also got videos on other LLM topics like RoPE embeddings, speculative sampling, quantization, etc. If you like learning through colorful animated videos, check them out! https://t.co/Bva2ctztqI
It's a new technique called speculative sampling. A smaller LLM generates the easier tokens and a larger LLM checks them. And using a rejection sampling trick, there is no difference in accuracy!
Check out my video on how this works โก๏ธhttps://t.co/eTwLIGh75R
Just published a comprehensive video highlighting EVERY area of Natural Language Processing research, in 24 categories. From Phonology to Translation to Summarization to LLMs, explore all of of NLP in 30 minutes!
@sherwinwu ChatGPT clearly states that sensitive information should not be fed to it. My question to you is: if OpenAI does not train on user data, then why is this warning in place?