Congrats @radixark !
From SGLang @lmsysorg to Miles, and to future products, RadixArk is dedicated to building a crucible capable of repeatedly producing cutting-edge AI, bringing the best of AI into every household. We believe in a future of AI diversity and hope to drive the integration of AI into every aspect of production and daily life. In the future we envision, AI will become a partner to many companies and individuals, finding ways to self-evolve—in production, in daily companionship, and within virtual worlds.
Everything we have experienced and will continue to experience in the SGLang and Miles open-source communities is unforgettable and highly anticipated. It has been both demanding and exhilarating, allowing us to see friendship, the world, and the boundaries.
Over the past six months, I have witnessed for the first time how a united team moves forward hand in hand, and how deeply passionate they are about creation. Each of us has taken on our respective roles and numerous new tasks for the first time; we are all stepping out of our comfort zones, growing, and creating at a rapid pace.
"It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless march that will lead us into the tens of thousands of miles yet to come."
In an era where AI has made ordinary productivity cheaper, relentless, day-to-day refinement has increasingly become the rare key that drives innovation and the future. We hope this will forever remain the soul of RadixArk's culture: focused, uncompromising, humble, and fearless. The underlying logic of creation is not the deliberate pursuit of novelty, but rather independent thinking that remains unswayed by temptation, paired with a meticulous drive for perfection.
Good post from @balajis on the "verification gap".
You could see it as there being two modes in creation. Borrowing GAN terminology:
1) generation and
2) discrimination.
e.g. painting - you make a brush stroke (1) and then you look for a while to see if you improved the painting (2). these two stages are interspersed in pretty much all creative work.
Second point. Discrimination can be computationally very hard.
- images are by far the easiest. e.g. image generator teams can create giant grids of results to decide if one image is better than the other. thank you to the giant GPU in your brain built for processing images very fast.
- text is much harder. it is skimmable, but you have to read, it is semantic, discrete and precise so you also have to reason (esp in e.g. code).
- audio is maybe even harder still imo, because it force a time axis so it's not even skimmable. you're forced to spend serial compute and can't parallelize it at all.
You could say that in coding LLMs have collapsed (1) to ~instant, but have done very little to address (2). A person still has to stare at the results and discriminate if they are good. This is my major criticism of LLM coding in that they casually spit out *way* too much code per query at arbitrary complexity, pretending there is no stage 2. Getting that much code is bad and scary. Instead, the LLM has to actively work with you to break down problems into little incremental steps, each more easily verifiable. It has to anticipate the computational work of (2) and reduce it as much as possible. It has to really care.
This leads me to probably the biggest misunderstanding non-coders have about coding. They think that coding is about writing the code (1). It's not. It's about staring at the code (2). Loading it all into your working memory. Pacing back and forth. Thinking through all the edge cases. If you catch me at a random point while I'm "programming", I'm probably just staring at the screen and, if interrupted, really mad because it is so computationally strenuous. If we only get much faster 1, but we don't also reduce 2 (which is most of the time!), then clearly the overall speed of coding won't improve (see Amdahl's law).
Thrilled to kick off the Gemini 2.0 era with Gemini 2.0 Flash, an update to our workhorse model that outperforms even 1.5 Pro at twice the speed. It has really great multilingual skills, and can natively call tools, like Google Search. It’s the first release in the Gemini 2.0 family of models, with more to come soon.
This is really just the beginning. 2025 will be the year of AI agents and Gemini 2.0 will be the generation of models that underpin our agent-based work. We’re sharing a set of prototypes made possible by 2.0 Flash’s new capabilities: including an update to Project Astra, our vision for a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers.
We’re also sharing a few other easter eggs, like: agents that help you navigate video games, which builds on our rich heritage of games breakthroughs in AI, and agents for robotics.
Evaluating LLMs in enterprise domains can be challenging. In this post, we share how our applied AI team synthesized high-quality code tests for specific libraries to enhance system performance. Joint work with MatthewHayes @matei_zaharia@ritendra!
#LLMs are revolutionizing code generation, but ensuring accuracy with domain-specific tools like Spark SQL is vital.
Discover how to synthesize tailored code tests for LLMs, offering a precise way to evaluate performance across any coding library. https://t.co/dyiAqWCeER
We are thrilled to announce the milestone release of SGLang Runtime v0.2, featuring significant inference optimizations after months of hard work.
It achieves up to 2.1x higher throughput compared to TRT-LLM and up to 3.8x higher throughput compared to vLLM. It consistently delivers superior performance when serving Llama-8B to 405B models on A100/H100 with FP8/BF16.
SGLang is fully open-source and implemented in Python. As it matures from a prototype, we invite the community to join us in creating the next-generation efficient serving engine!
Learn more at https://t.co/ipq3FL9MNi
We just launched Databricks Assistant Autocomplete, another context-aware AI feature using our data intelligence engine. Now your autocomplete in notebooks and SQL is aware of all the data in your catalog and how it is used! https://t.co/GLqrOHrNit
Today we released an open source model, DBRX, that beats all previous open source models on the standard benchmarks. The model itself is a Mixture of Experts (MoE), that's roughly twice the brains (132B) but half the cost (36B) of Llama2-70B. Making it both smart and cheap. Since only 36B expert parameters are used live, it's close to twice the speed (tokens/seconds) of Llama2-70B. We're excited to build custom versions of this for organizations that have proprietary data! Check it out!
https://t.co/KA5rLaCnQx
At Databricks, we've built an awesome model training and tuning stack. We now used it to release DBRX, the best open source LLM on standard benchmarks to date, exceeding GPT-3.5 while running 2x faster than Llama-70B. https://t.co/QEx7gND6UJ