bandish

@bandish

Engineer @MosaicML, I work on making DL efficient and accessible.

San Francisco, CA

Joined December 2008

468 Following

238 Followers

178 Posts

bandish retweeted

Unconventional AI @unconvAI

3 days ago

Lewis Hamilton recently pointed to a mismatch between simulation and reality as a major factor behind a challenging race weekend. It's a problem every systems engineer recognizes: how do you simulate a system operating at the edge of chaos? In our latest blog, we explore how Unconventional AI builds digital twins that faithfully model hardware operating in these highly sensitive regimes, capturing the physical and numerical effects that can significantly influence real-world performance. Read the full post: https://t.co/f1mv15xMcZ

bandish retweeted

Unconventional AI @unconvAI

24 days ago

Most real-world systems are dynamic. So why do we still treat computation as static? Our latest blog explores computation through motion using gyroscopes, rods, springs, and ordinary differential equations to perform handwritten digit classification. A deep dive into: • dynamical systems as compute • differentiable ODE solvers • physics-inspired machine learning • emergent computation through interaction Read here: https://t.co/pIwwvT72Bw

264

214

70K

bandish retweeted

Unconventional AI @unconvAI

about 1 month ago

Tomorrow, May 15, is the final day to submit pre-proposals for the Unconventional Grant. Over the past several weeks, we’ve seen proposals spanning: • computation as dynamics • in-memory and in-physics compute • architectures that minimize data movement • new abstractions beyond linear algebra Many converge on the same intuition: meaningful efficiency gains in AI will not come from scaling existing approaches alone, but from fundamentally different ways of representing and computing. We are looking for technically grounded ideas that challenge assumptions across hardware, systems, and learning. We’re not looking for taller ladders to the moon. We’re looking for rockets. https://t.co/7ups8vMcdZ

bandish retweeted

Unconventional AI @unconvAI

about 1 month ago

Getting to 1000x energy efficiency in AI isn’t about one breakthrough. It’s about solving two hard constraints: 1. Data movement dominates energy 2. Amdahl’s Law caps system-level gains Which means you have to rethink everything: models, hardware, and how they’re designed together. If this kind of problem excites you, you’ll enjoy our latest blog: https://t.co/3FFKWIm1nc

34K

Who to follow

Researcher @Databricks. Former @MosaicML, @CerebrasSystems. Addicted to all things compute.

Vansh Singh

@vanshcsingh

hardware health 🎛🔥 @Openai. Previously @DbrxMosaicAI, @Stripe

bandish retweeted

Naveen Rao

@NaveenGRao

about 1 month ago

At [un] @unconvAI we're not only rethinking computers, but also how intelligence emerges from the physical world. We're working at the frontier and our computing primitives are physics. If you're interested in the intersection of nonlinear dynamics and language and reasoning, apply here: https://t.co/iGpuKvuJxe

bandish retweeted

Databricks AI Research

@DbrxMosaicAI

3 months ago

Meet KARL: a faster agent for enterprise knowledge, powered by custom reinforcement learning (now in preview). Enterprise knowledge work isn’t just Q&A. Agents need to search for documents, find facts, cross-reference information, and reason over dozens or hundreds of steps. KARL (Knowledge Agent via Reinforcement Learning) was built to handle this full spectrum of grounded reasoning tasks. The result: frontier-level performance on complex knowledge workloads at a fraction of the cost and latency of leading proprietary models. These advances are already making their way into Agent Bricks, improving how knowledge agents reason over enterprise data. And Databricks customers can apply the same reinforcement learning techniques used to train KARL to build custom agents for their own enterprise use cases. Read the research → https://t.co/eFyXxCWUAd Blog: https://t.co/03sLHTUcLl

$DbrxMosaicAI's tweet photo. Meet KARL: a faster agent for enterprise knowledge, powered by custom reinforcement learning (now in preview). Enterprise knowledge work isn’t just Q&A. Agents need to search for documents, find facts, cross-reference information, and reason over dozens or hundreds of steps. KARL (Knowledge Agent via Reinforcement Learning) was built to handle this full spectrum of grounded reasoning tasks. The result: frontier-level performance on complex knowledge workloads at a fraction of the cost and latency of leading proprietary models. These advances are already making their way into Agent Bricks, improving how knowledge agents reason over enterprise data. And Databricks customers can apply the same reinforcement learning techniques used to train KARL to build custom agents for their own enterprise use cases. Read the research → https://t.co/eFyXxCWUAd Blog: https://t.co/03sLHTUcLl$

519

600

407K

bandish retweeted

Ali Ghodsi

@alighodsi

10 months ago

Databricks just signed a Series K term sheet at >$100B valuation to scale two flagship products: 🔥 Lakebase — serverless Postgres with true compute/storage separation 🧠 Agent Bricks — agentic framework with built-in reasoning guardrails for enterprise data https://t.co/rgM5vMggwe

106

252

220K

bandish retweeted

Jonathan Frankle

@jefrankle

11 months ago

RLVR isn't just for math and coding! At @databricks, it's impacting products and users across domains. One example: SQL Q&A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product.

jefrankle's tweet photo. RLVR isn't just for math and coding! At @databricks, it's impacting products and users across domains. One example: SQL Q&A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product. https://t.co/JAsXpPdumd

107

23K

bandish retweeted

Jonathan Frankle

@jefrankle

11 months ago

I'm at ICML 🇨🇦 and I'm hiring at @databricks. Visit our booth if you're interested. My scientific focus: It's 1972 in AI, there's an AI crisis, Dijkstra isn't here to save us, and maybe RL can. Why Databricks? The long road to AGI is being paved here and we have the real evals 🧵

225

42K

bandish retweeted

Davis Blalock

@davisblalock

12 months ago

Deep learning training is a mathematical dumpster fire. But it turns out that if you *fix* the math, everything kinda just works…fp8 training, hyperparameter transfer, training stability, and more. [1/n]

davisblalock's tweet photo. Deep learning training is a mathematical dumpster fire.

But it turns out that if you *fix* the math, everything kinda just works…fp8 training, hyperparameter transfer, training stability, and more. [1/n] https://t.co/wtiPo5pFsL

148

189K

bandish @bandish

12 months ago

@davisblalock Congrats really great to see this out!

183

bandish retweeted

Databricks @databricks

over 1 year ago

Thrilled to partner with @AIatMeta to release the latest Llama 3 models on Databricks. The Llama 3.2 release pushes the frontier of enterprise GenAI w/smaller models for cost-sensitive use cases & larger multimodal models. Available in Databricks Mosaic AI https://t.co/P5gYzceh9F

bandish retweeted

Databricks @databricks

over 1 year ago

Mosaic AI Model Training now supports 131K tokens for fine-tuning Meta Llama 3.1! Build even more powerful RAG and tool use systems with long context enterprise data: https://t.co/UUDnJl6Ymz

bandish retweeted

Sasha Doubov @sashadoubov

almost 2 years ago

some notes from paper! - 405B trained on 15.6T tokens, 3.8e25 flops - use SFT, rejection sampling and DPO - annealing is used to judge quality of domain specific data (s/o dbrx paper)

bandish retweeted

Databricks AI Research

@DbrxMosaicAI

almost 2 years ago

Popular #LLM scaling laws only factor in training costs, and ignore the costs of deployment. In a paper presented at @icmlconf 2024, @databricks Mosaic AI researchers Nikhil Sardana, @JacobianNeuro, and @sashadoubov propose a modified scaling law that considers the cost of both training and inference and experimentally demonstrate how “overtrained” LLMs can be the optimal choice: https://t.co/3HgHD0RBEO

DbrxMosaicAI's tweet photo. Popular #LLM scaling laws only factor in training costs, and ignore the costs of deployment. In a paper presented at @icmlconf 2024, @databricks Mosaic AI researchers Nikhil Sardana, @JacobianNeuro, and @sashadoubov propose a modified scaling law that considers the cost of both training and inference and experimentally demonstrate how “overtrained” LLMs can be the optimal choice: https://t.co/3HgHD0RBEO

17K

bandish @bandish

almost 2 years ago

@NaveenGRao Just use our brains for compute

bandish retweeted

Vitaliy Chiley

@vitaliychiley

almost 2 years ago

🎶🎶 Do you want to build an MoE? 🎶🎶 It was great collaboration with the team at PyTorch to integrate the tooling needed to makes MoE training easier and more efficient.

bandish retweeted

Mihir Patel @mvpatel2000

almost 2 years ago

Fun collaboration between @DbrxMosaicAI and @PyTorch team! We've been working hard to scale MoEs and PyTorch distributed to thousands of GPUs, and this is a great summary of a lot of the cool things we've added to PyTorch. Quick rundown (1/N)

115

31K

bandish @bandish

almost 2 years ago

@dylan522p @mvpatel2000 @cis_female Edit: I mean when they have a system we shall see

bandish retweeted

Rishab Parthasarathy @rishab_partha

almost 2 years ago

We are excited to announce Vid3D, a technique for generating 3D video using only 2D video diffusion models and Gaussian splatting! Paper: https://t.co/RnbnyRZHJU Github: https://t.co/ZmYJEe6hOb Project Page: https://t.co/gYQXnb9xkX

bandish

@bandish

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users