#ChatGPT and #GPT3 are hot. But let’s be practical, when we want to reproduce GPT-3 or use it in our applications. Why did all of the public reproduction of GPT-3 fail? In which tasks should we use GPT-3.5/ChatGPT? I tried to answer them in a new blog: https://t.co/KMoyiC7eh0 .
Excited to share Ornith, our latest family of open-source models specialized for agentic coding.
Ornith achieves SOTA performance among open-source models of comparable size on a variety of coding benchmarks (Terminal-Bench 2.1, SWE, NL2Repo, OpenClaw, SWE Atlas, etc)
Feedback is deeply appreciated!
📖Tech Blog: https://t.co/MiaaDExj9B
🤗Huggingface: https://t.co/eDtzanc5Vp
This is what I suggested to @lawhy_X for agentic RL back then. He moved incredibly fast, implemented it, and clearly demonstrated its effectiveness. Check out the implementation!
The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :)
With all students in my cs329x Human-Centered LLM class, we present 60+ pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!
📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM
1/n
🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model.
🖼️ Check out our gallery at https://t.co/CEQJXroPaE
🧵 (1/N) continue ⬇️
Introducing @NeoCognition, the agent lab for specialized intelligence.
Everyone needs experts, but human expertise does not scale.
Backed by $40M seed funding, we build self-learning agents that specialize across domains to make expertise abundant.
1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities.
I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵
Very excited to share that our project was selected as a @LaudeInstitute Moonshots seed grant winner on workforce upskilling @tatsu_hashimoto@erikbryn
This is something I think about a lot these days. With so much uncertainty about AI and jobs today, I'm deeply motivated by this question of how can we use LLMs not to replace people, but to empower them...
“Automated research on outcome-gradable problems is already practical.” This work strongly validates my own experience using agents for automated research. I’m very excited to see this research come out—huge congrats to @liangqiu_1994! He’s a brilliant researcher, and I’m constantly inspired by his passion, creativity, and rigor.
New research result: we use Claude to make fully autonomous progress on scalable oversight research, as measured by performance gap recovered (PGR).
Claude iterates on a number of different techniques and ends up significantly outperforming human researchers for $18k in credits.
Excited to share Muse Spark 🥑, a big step in MSL's journey towards personal superintelligence.
Try it out on https://t.co/rgVnOxYD04 and let us know your feedback!
It’s been an exciting nine months training this model from scratch. I’m especially proud of the opportunity to rebuild the foundational infrastructure alongside the strongest infra team I’ve ever worked with. The systems we’ve built will serve as a solid foundation for many more models to come. Stay tuned!
Excited to share Muse Spark, the first model from whole team’s work in MSL! 🚀
It’s natively multimodal and agentic. I’ve been using it for my daily coding and research tasks. Still plenty of room to improve in agentic domains, but we’re moving with great velocity.
It’s a seriously good model! Check out the full breakdown and try it out in https://t.co/Fka0wdAswy
Check out Muse Spark, our first milestone in the quest for personal superintelligence! Scaling this with the team has been a total blast. Give it a spin and let us know what you think! 🥑