Today, we’re excited to introduce Miso One, the most emotive voice model in the world.
Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.
We’ve open-sourced the model weights, with API access coming soon.
Hear how Miso One sounds in the thread below.
Meet "Tawan" (ตะวัน means “The sun” 🌞)
20-min Demo
AI Animated Film experiment (full feature is 90 mins)
With a $500 budget and 1.5 months of work, this story is based on my original idea from 2010... and today, the technology is finally ready to bring it to life.
AI showed me that solo creators can now manage an entire animation workflow. No need to pitch to big studios—you can fund yourself and bring your own stories to life. Without AI, this would still be stuck in my head.
(Of course, the quality is still far from high-budget films from big studios that have 300-600 people behind them, but I think the gap will close step by step in the future.)
This animated created by Seedance 2.0 on @dreamina_ai and @kinovi_ai and opening scene by @midjourney - thank you for watching.
A 6-person team is building task-specific AI models that are 4-8x faster than anything from OpenAI or Anthropic. 500K downloads on HuggingFace. No hype. Just better engineering winning on the merits.
This is what "make something people want" looks like in the model layer.
https://t.co/nsf8b31xha
INCREDIBLE
The MOST COMPLETE GUIDE for understanding LLMs from first principles is now available online to read for free
Covers the model mechanics
- Tokens / tokenizers
- Transformers
- Attention
- KV cache
- Prefill vs decode
- Decoding controls
- Model packages
- Chat templates
- Long context
- RAG
- Agents / tools
- Fine-tuning
- Multimodal models
Then connects that to running models locally
- What "local" really means
- Open-weight vs opensource
- Quantization
- VRAM math
- Hardware tiers
- File formats / load safety
- Runtimes / serving modes
- Model selection
- Privacy
- Failure modes
- Benchmarks
- Practical setup paths
You should read this, and if you cannot now then you most definitely wanna bookmark it for later
Opensource AI FTW
New work with @AlecRad and @DavidDuvenaud:
Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text.
Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc.
More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage:
1) raw text (hard/effortful to read)
2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default
3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default
...4,5,6,...
n) interactive neural videos/simulations
Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral https://t.co/z21CP5iQfu
There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen.
TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
The countries that developed in the post-WWII period share one variable more consistently than any cultural or ethnic characteristic:
Strategic value to the United States during the Cold War.
South Korea: essential. Fully developed.
Taiwan: essential. Fully developed.
Singapore: strategically useful. Developed.
Israel: essential. Massively subsidized, still today.
Western Europe: essential. Marshall Plan.
Japan: essential. Rebuilt from occupation.
Countries with limited Cold War strategic value, or whose development was contrary to Western corporate interests:
Ghana: destabilized.
Congo: leader assassinated, replaced with Mobutu, looted for thirty-two years.
Chile: democratic socialist government removed by coup, replaced with Pinochet, opened to Chicago School experiment.
Iran: democratic government removed in 1953, replaced with Shah, resources extracted until 1979.
Vietnam: destroyed.
Nicaragua: destabilized.
Bolivia: multiple coups.
The pattern is not cultural.
The pattern is strategic.
Develop what serves American power.
Destabilize what threatens it.
Present both outcomes as the natural result of the affected populations' own characteristics.
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!
📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM
1/n
Sri Lanka is testing the Metro Bus Digital Platform (MBDP) by Lanka Metro Transit pilot launching this April on Makubura–Pettah & Kadawatha routes.
• Real-time bus tracking
• Tap-in/Tap-out mobile payments
• Centralized smart transport system
#DigitalSriLanka#SriLanka#LKA
Chris Manning says Yann LeCun sees language as a low bandwidth communication channel compared to vision.
But the gap between a chimp and a human wasn’t produced by superior eyes.
What took off for humans was language.
Not just for communication, but as a cognitive tool.
Over the past few years I've had many discussions with people who want to build a house in Sri Lanka. Based on my experiences, I wrote a detailed guide to all the things you're going to learn the hard way.
Hope it's useful. Pass it on if it is.
Link:
https://t.co/KzakZw1YRV
Human-level general intelligence is achieved when an AI system can approach a new task and figure it out, without human intervention, *with the same learning efficiency as humans*.
If every new task requires human intervention, it's not general. If every new task requires brute-forcing, it's not human-level.
The #Cabinet spokesperson has finally admitted (indirectly) what we’ve been saying: the Govt has been overpricing fuel.
They’re still using inflated assumed barrel prices in the formula, then claiming small “subsidies” of Rs.20 on petrol & Rs.100 on diesel that aren’t based on real landed costs.
Partial admission is not enough. We demand full transparency: publish actual landed costs and the real formula every month.
Any genuine subsidy must be targeted to the vulnerable via cash transfers, not a universal blanket.
No more manipulation. Sri Lankans deserve honesty on fuel prices.
Watch here 👇
https://t.co/F8Jg2MvGBZ
#FuelPricing #TransparencyNow #TargetedSubsidies #SriLanka #HarshaDeSilva