Microsoft just launched Copilot Vision and it's INSANE
It can see what you do on the web and talks with you about it
7 wild examples so far (Don't miss the 5th one):
Oh wow! I did not know that @Dropbox built a calendar AI application called @reclaimai
Definitely worth checking out if you are looking for AI driven #ProductivityHacks . It supports Google Calendar with Outlook on the way.
https://t.co/yLsYyjwQec
End to End Speech models are on fire - LLAMA-OMNI 8B - Apache licensed! ๐ฅ
> Speech Encoder - Whisper Large v3
> LLM backbone - Llama 3.1 8B Instruct
> Speech Decoder - HuBERT (UnitY)
> Simultaneously generate Speech + Text
> Less than 250 ms latency
> Trained in less than 3 days on 4x GPUs
> Used 200K instruct pairs
> Model checkpoints on the Hub ๐ค
> Space incoming!
GG! I'm here for this trend! ๐
I take this seriously. Starting next year, I plan to only publicly mention (in blogs, talks, etc) L2s that are stage 1+, with *maybe a short grace period* for new genuinely interesting projects.
It doesn't matter if I invested, or if you're my friend; stage 1 or bust.
Multiple ZK-rollup teams have told me they're on track to be stage 1 by year end. I'm excited to see that happen!
Of course we should not throw away training wheels become we're actually confident that the proof systems are secure; that would be irresponsible. But stage 1 (75% threshold on council to override the proof system, 26%+ of council must be outside the rollup team) is a very reasonable moderate milestone. The multisigs I'm in have not had a single liveness failure in years, let alone 26%.
The era of rollups being glorified multisigs is coming to an end. The era of cryptographic trust is upon us.
if you strap a rocket to a dumpster, the dumpster can still get to orbit, and the trash fire will go out as it leaves the atmosphere.
many important insights contained in that observation.
but also it's better to launch nice satellites instead.
IT'S FINALLY HERE!
๐ฅ Freepik Mystic ๐ฅ
โAny sufficiently advanced technology is indistinguishable from magic.โ โ Arthur C. Clarke โจ Mystic is the most advanced AI generator to date with outputs directly in Full HD.
But what's really Mystic? Let's dive in ๐งต๐
Quite a packed, but stellar week for Open Science AI:
1. Microsoft open sourced Phi 3.5 mini, MoE and vision with 128K context, multilingual & MIT license!
MoE beats Gemini flash, Vision competitive with GPT4o.
2. Nvidia dropped Mistral NeMo Minitron 8B - Distilled + pruned from 12B, commercially permissive license, and beats the teacher (12B) on multiple benchmarks.
3. AI21Labs pushed out Jamba 1.5 Mini (12B A/ 52B T) & Large (94B A/ 398B T) - MoE, permissively licensed, 256K context, Multilingual, JSON model & Tool use.
4. Homebrew released Llama-s v0.2 - giving Llama 3.1 8B ears (capable to process audio directly) quite comparable to the SoTA! Still a WIP - exciting direction.
What did I miss? ๐ค
Looking forward to the next week ;)
Hereโs a fun notebook that shows you how set up a multi-agent system that uses Gemma via Ollama to answer questions and fact check the answers.
https://t.co/zH16tahvZ6
Multi-modal Llama 3.1 - Llama 3.1 just got Ears! ๐ฅ
From @homebrewltd - Llama 3.1 S.
Uses early fusion with semantic tokens, the entire pipeline looks like:
Audio -> WhisperVQ (Encoder) -> Semantic tokens -> Llama 3.1
Training:
1. Pre-train Llama 3.1 8B on MLS 10K dataset on Next Token Prediciton task.
2. Instruction Tuning on synthetic Audio + Text pairs.
Synthetic audio generated with WhisperSpeech.
Very excited about their community driven feedback approach. Looking forward to the next iteration.
Checkout their demo below ๐ค
Tomorrow at 11am PST, there will be a livestream
announcement from Google about Ollama + Google Cloud Run.
๐๐๐ Watch live with us:
https://t.co/9RYV1s5DGM
I am lucky to have access to "Mystic" by @javilopen. All I can say is that the quality of images blew me away. Click on the images and see the full glory for yourself!
Introducing Speech to Speech! ๐ฅ - Modular, cross-platform pipeline to run GPT4o-like experiences on device, 100% private! all w/ as low as 500 ms latency! โก
We brought together the best features of Transformers together in one package:
> Voice Activity Detection: Silero VAD v5
> Speech-to-Text (STT): Whisper
> Language Model (LLM): Any instruct model
> Text-to-Speech (TTS): Parler-TTS
Paired with in-build quantisation schemes like AWQ/ GPTQ/ BnB you can reduce the memory and get faster inference! ๐ฅ
Works on Mac and CUDA - this is just the first iteration, we're working on making it better and faster! feedback is more than appreciated! ๐ค
Kudos to @eustachelb for leading the charge on this front! โค๏ธ