12years Product Manager. Multiple Apps/Website Founders.
Busy living or Busy dying. Crave for building great product.
⚡Product | AI | Crypto |Defi | HealthTech⚡
🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date.
Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike.
More details and examples of what Movie Gen can do ➡️ https://t.co/M19x2ndwnr
🛠️ Movie Gen models and capabilities
Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt.
Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment.
Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes.
Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video.
We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.
Input optional product
Don't ask your users for input. Coming up with input is hard, and a barrier to use. Think of users as wanting to play. We have AI - predict the input! Design products into autonomous environments. Allow users to play by steering a bit.
a little Sunday surprise for you...
meet @browsercompany's 2nd product:
🔍Arc Search🔎
it's a default browser for your iPhone
...that BROWSES FOR YOU
the origin story is a bit unusual so I wanted to give you the full backstory:
"The best designers and the best programmers aren’t the ones with the best skills, or the nimblest fingers, or the ones who can rock and roll with photoshop or vim, they are the ones that can determine what just doesn’t matter. That’s where the real gains are made."
There's a new multi-modal RAG stack that's emerging, letting users do QA over complex documents and images.
Here's a diagram and 🧵 of what it consists of 👇
Multi-modal RAG extends beyond RAG in the following ways:
* Input: The input can be a text or image query.
* Embeddings: You can natively embed/index images with joint embeddings (CLIP). You can choose to embed text the same way or use specialized text embeddings (e.g. ada)
* Storage: Use a vector database to store images. The image file itself could live in a separate docstore or in a vector db. You can use the same vector db for image/text storage (e.g. @trychroma ) or separate collections
* Retrieval: Given a user query, the retrieved context can be text, images or both. If we use separate image/text embeddings, you need two retrieval calls.
* Synthesis: We can use either a multi-modal model (GPT-4V) that can take in both text and images, or a standard LLM (gpt-4-turbo) that takes in just text. If the latter, you may need to caption/summarize each image into text.
* Response: The returned result can be text or images.
We're building towards this future. We launched multi-modal indexing/retrieval abstractions in @llama_index today. We're building towards the following:
💡 More multi-modal embeddings/LLMs
💡 More ways to store images/text in different storage systems
💡 More ways to combine/apply lessons from text retrieval to image retrieval
💡 More ways to synthesize over arbitrary text/images
Blog: https://t.co/VtFgtOZzHi
Multi-modal RAG example: https://t.co/yh6e78frXK
I must say China is now behind the U.S. in the LLM ecosystem momentum. The underlying vision for me to start @01AI_Yi is to make better AI accessible to more people. We are glad our first moderate-size Yi-34B performs competitively at a global level. More to come soon after this base model. https://t.co/WjUd18RN7f
https://t.co/OCcpAtqgTQ
在網路圈做了10幾年產品經理,花了快一年時間All IN AI,研究很多No-code AI ChatGPT方法。怎麼自動化工作AI Flow。我相信未來會是個AI Creator時代。
AI課程:https://t.co/2BpXxFVVOB (500NT折扣碼:aicreator)
#AI#ChatGPT#AICreator#AIAutomation#AIGeneration