MLCommons

@MLCommons

Better Artificial Intelligence for Everyone

Joined September 2020

149 Following

3.6K Followers

825 Posts

MLCommons @MLCommons

5 days ago

The security world's "find it → patch it → disclose it" model doesn't work for AI. You can't patch a released open-weight model. The weights are already out there — forever. MLCommons is building the disclosure standard AI evaluation actually needs. https://t.co/zFgTE8dMBQ

MLCommons's tweet photo. The security world's "find it → patch it → disclose it" model doesn't work for AI.
You can't patch a released open-weight model. The weights are already out there — forever.
MLCommons is building the disclosure standard AI evaluation actually needs.

https://t.co/zFgTE8dMBQ https://t.co/GoxPTOjHz9

0

0

0

0

155

MLCommons @MLCommons

12 days ago

AI systems co-design is too fragmented. Enter MLCommons Chakra (#MLSys2026): an open execution trace ecosystem to bridge software & hardware without exposing IP. Native in @PyTorch, NVIDIA NeMo, & vLLM. https://t.co/a6h3S7j6XT

MLCommons's tweet photo. AI systems co-design is too fragmented.

Enter MLCommons Chakra (#MLSys2026): an open execution trace ecosystem to bridge software & hardware without exposing IP.

Native in @PyTorch, NVIDIA NeMo, & vLLM. https://t.co/a6h3S7j6XT https://t.co/DM38hkNZfP

0

1

1

0

152

MLCommons @MLCommons

13 days ago

Meet GeoCroissant. Built on MLCommons Croissant, it adds Earth observation-specific metadata—from coordinate systems to spatial resolution—to give you better traceability and more reproducible workflows for agentic AI pipelines. https://t.co/Hzo4dH0F4K

MLCommons's tweet photo. Meet GeoCroissant.
Built on MLCommons Croissant, it adds Earth observation-specific metadata—from coordinate systems to spatial resolution—to give you better traceability and more reproducible workflows for agentic AI pipelines.
https://t.co/Hzo4dH0F4K https://t.co/qNNYOGz24H

0

0

0

0

119

MLCommons @MLCommons

26 days ago

Introducing the 2026 @MLCommons Rising Stars! 🌟 We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI. Meet the cohort: https://t.co/yovGC0i7Sv #AI #MLCommons

MLCommons's tweet photo. Introducing the 2026 @MLCommons Rising Stars! 🌟
We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI.
Meet the cohort: https://t.co/yovGC0i7Sv
#AI #MLCommons https://t.co/drqspG75Cv

1

2

1

1

1K

Who to follow

Databricks AI Research

Verified account

We remove the barriers to state-of-the-art generative AI model development and make data + AI available to all.

Verified account

CEO @unconvai. Former CEO MosaicML/Databricks & Nervana/IntelAI. Neuro + CS. I like to build stuff that will eventually learn how to build other stuff.

Vijay Janapa Reddi

Verified account

Computer Scientist and Prof. of Electrical Engineering @ Harvard University.

MLCommons @MLCommons

26 days ago

The median AI benchmark longevity score is 5/100. AILuminate scored 75—but even that degrades over time. To fix this, the @MLCommons AIRR team built the Continuous Prompt Stewardship System to keep risk evaluation fresh and reliable. https://t.co/a6bBJyEZSb

MLCommons's tweet photo. The median AI benchmark longevity score is 5/100.
AILuminate scored 75—but even that degrades over time. To fix this, the @MLCommons AIRR team built the Continuous Prompt Stewardship System to keep risk evaluation fresh and reliable.
https://t.co/a6bBJyEZSb https://t.co/HUuCjfuzqn

0

1

0

0

139

MLCommons @MLCommons

27 days ago

What does AI reliability actually require? It comes down to consistently following the right behavioral rules—even under adversarial attack. Meet the AI Reliability Map to guide pre-deployment testing. Explore the framework: https://t.co/VzFkrkkFYf #AIReliability #AI

MLCommons's tweet photo. What does AI reliability actually require? It comes down to consistently following the right behavioral rules—even under adversarial attack.

Meet the AI Reliability Map to guide pre-deployment testing.

Explore the framework: https://t.co/VzFkrkkFYf
#AIReliability #AI https://t.co/oMrM8oyuAr

0

0

0

0

124

MLCommons @MLCommons

about 1 month ago

Do tools like OpenClaw signal a turning point for mainstream AI adoption? MLCommons' Dave Graham debated that and more on the Utilizing AI podcast. What do you think? https://t.co/1HfXUgZfV2 #AgenticAI #AI

0

1

0

0

135

MLCommons @MLCommons

about 1 month ago

MLPerf Training v6.0 has added GPT-OSS 20B. With 21B total parameters (but only 3.6B active per token), this new sparse MoE pretraining benchmark is designed specifically for accessibility—it can run on a single 8-GPU node. https://t.co/qiZAyFSoj0

MLCommons's tweet photo. MLPerf Training v6.0 has added GPT-OSS 20B. With 21B total parameters (but only 3.6B active per token), this new sparse MoE pretraining benchmark is designed specifically for accessibility—it can run on a single 8-GPU node.
https://t.co/qiZAyFSoj0 https://t.co/pGJ6nVboZ5

0

3

1

1

181

MLCommons @MLCommons

about 1 month ago

Mixture-of-Experts (MoE) architectures like DeepSeek-V3 are the new standard for scaling frontier LLMs. Now, that architecture is part of MLPerf Training v6.0. https://t.co/jSKOWI4f6v

MLCommons's tweet photo. Mixture-of-Experts (MoE) architectures like DeepSeek-V3 are the new standard for scaling frontier LLMs. Now, that architecture is part of MLPerf Training v6.0.
https://t.co/jSKOWI4f6v https://t.co/WiyNV1th3V

0

1

0

1

2K

MLCommons @MLCommons

about 1 month ago

AI Risk and Reliability certification shouldn't be a self-assessment. That's the premise behind the AILuminate Global Assurance Program (GAP). GAP gives organizations an independent path to certify that their AI systems meet established safety standards. https://t.co/2Xu9UtTwoz

MLCommons's tweet photo. AI Risk and Reliability certification shouldn't be a self-assessment.
That's the premise behind the AILuminate Global Assurance Program (GAP). GAP gives organizations an independent path to certify that their AI systems meet established safety standards.
https://t.co/2Xu9UtTwoz https://t.co/n0B8CVA3Y7

0

0

0

0

101

MLCommons @MLCommons

about 1 month ago

MLPerf Endpoints: decoupled client, any endpoint, zero-effort integration. Cloud or bare-metal — evaluated equally. Built for API-first GenAI. https://t.co/fPDH7hXj8d #MLPerf

MLCommons's tweet photo. MLPerf Endpoints: decoupled client, any endpoint, zero-effort integration. Cloud or bare-metal — evaluated equally. Built for API-first GenAI.
https://t.co/fPDH7hXj8d
#MLPerf https://t.co/0NxvN0SLOl

0

0

0

0

80

MLCommons @MLCommons

about 1 month ago

Great to see Microsoft highlighting the need for global collaboration on AI safety testing—and shouting out the MLCommons community’s ongoing work to expand the AILuminate benchmarks for multilingual and multimodal testing. https://t.co/DPefMbiC1v

0

0

0

0

104

MLCommons @MLCommons

about 1 month ago

The New Wave of AI in Healthcare 2026 symposium kicks off today in NYC! 5/13 at 10:50 AM, MLCommons' Andrew Gruen, PhD will be taking the stage. If you're attending, don't miss this conversation on trust, accountability, and AI validation in medicine. https://t.co/EB08squ1F8

MLCommons's tweet photo. The New Wave of AI in Healthcare 2026 symposium kicks off today in NYC!
5/13 at 10:50 AM, MLCommons' Andrew Gruen, PhD will be taking the stage.
If you're attending, don't miss this conversation on trust, accountability, and AI validation in medicine.
https://t.co/EB08squ1F8 https://t.co/vzWLwH8ls4

0

1

1

0

123

MLCommons @MLCommons

about 1 month ago

AI software optimization is now moving faster than hardware cycles. To capture these rapid gains, MLPerf is shifting to a rolling submission cadence. David Kanter explains why this speed matters for enterprise buyers via Nutanix: https://t.co/YmoB3VLpt4 #MLPerf #AI

0

0

0

1

113

MLCommons @MLCommons

about 1 month ago

Submissions for MLPerf Training v6.0 are open! This round brings updates, including the introduction of large-scale MoE pretraining architectures. Benchmarking on a single 8-GPU node or massive cluster, we want your results in this round. https://t.co/oOA3g6lFut

MLCommons's tweet photo. Submissions for MLPerf Training v6.0 are open!
This round brings updates, including the introduction of large-scale MoE pretraining architectures. Benchmarking on a single 8-GPU node or massive cluster, we want your results in this round.
https://t.co/oOA3g6lFut https://t.co/sOVqQIJND6

0

0

0

0

174

MLCommons @MLCommons

about 1 month ago

We're thrilled to welcome @flwrlabs to MLCommons to help shape standards for federated AI at scale. First up: MedPerf is integrating with Flower, enabling researchers to run federated clinical AI studies without moving sensitive patient data. More: https://t.co/fMY5PN1wqj

MLCommons's tweet photo. We're thrilled to welcome @flwrlabs to MLCommons to help shape standards for federated AI at scale.
First up: MedPerf is integrating with Flower, enabling researchers to run federated clinical AI studies without moving sensitive patient data.
More: https://t.co/fMY5PN1wqj https://t.co/wrRbBfz8Ei

1

0

1

0

120

MLCommons @MLCommons

about 1 month ago

Measuring today’s production workloads is getting harder. The Inference working group stepped up by adding GPT-OSS 120B, DeepSeek-R1, and our first text-to-video generation benchmark. https://t.co/avj9D2nQ68

MLCommons's tweet photo. Measuring today’s production workloads is getting harder. The Inference working group stepped up by adding GPT-OSS 120B, DeepSeek-R1, and our first text-to-video generation benchmark.
https://t.co/avj9D2nQ68 https://t.co/0oRJBTyyUs

0

0

0

0

158

MLCommons @MLCommons

about 1 month ago

MoE benchmarking doesn't require a supercomputer. MLPerf Training v6.0 introduces GPT-OSS 20B: a sparse Mixture-of-Experts pretraining benchmark that can run on a single 8-GPU node. See how the task force engineered away statistical variance (CV < 5%): https://t.co/iH5TbLSbrY

MLCommons's tweet photo. MoE benchmarking doesn't require a supercomputer.

MLPerf Training v6.0 introduces GPT-OSS 20B: a sparse Mixture-of-Experts pretraining benchmark that can run on a single 8-GPU node.

See how the task force engineered away statistical variance (CV < 5%): https://t.co/iH5TbLSbrY https://t.co/R6HxQLpxtI

0

6

3

6

728

MLCommons @MLCommons

about 1 month ago

Mixture-of-Experts (MoE) is coming to MLPerf Training v6.0. The new DeepSeek-V3 large-scale pretraining benchmark captures critical innovations like MLA, fine-grained expert segmentation, and MTP at production scale (671B parameters). Technical details: https://t.co/i8CWkbdU6o

MLCommons's tweet photo. Mixture-of-Experts (MoE) is coming to MLPerf Training v6.0.

The new DeepSeek-V3 large-scale pretraining benchmark captures critical innovations like MLA, fine-grained expert segmentation, and MTP at production scale (671B parameters).

Technical details: https://t.co/i8CWkbdU6o https://t.co/2tl4Cw5e2f

0

6

2

1

591

MLCommons @MLCommons

about 2 months ago

Security theater vs. rigorous AI benchmarking - the difference is methodology. AILuminate Jailbreak v0.7: a mechanism-first taxonomy for single-turn jailbreak attacks. Defensible. Reproducible. Auditable. https://t.co/QwANlhpw1d #AILuminate #AISecurity

MLCommons's tweet photo. Security theater vs. rigorous AI benchmarking - the difference is methodology. AILuminate Jailbreak v0.7: a mechanism-first taxonomy for single-turn jailbreak attacks. Defensible. Reproducible. Auditable.
https://t.co/QwANlhpw1d
#AILuminate #AISecurity https://t.co/6P5U3smBJG

0

0

0

1

154

Last Seen Users on Sotwe

Trends for you

Most Popular Users