Presenting at Upper Bound 2026 on how ML research platforms have evolved past traditional HPC schedulers.
"Beyond Slurm: Modern Advancements in Machine Learning Research Platforms"
Thu May 21 · 3–4pm MDT
MacEwan Stage (Salon 2)
Would be glad to connect with others there!
#UpperBound2026 @AmiiThinks@transformerlab
Here's a demo of how easy it is to train a text to speech model in Transformer Lab.
🎙️ Base model: orpheus-3b-0.1-ft
📚 Dataset: campwill/HAL-9000-Speech
📝 Eval: bosonai/EmergentTTS-Eval
🧪 Train, sample, and listen back without leaving the UI
⌨️ GUI shown here, agent friendly CLI also available
Get started: https://t.co/eQQV6dOabk
Links to artifacts👇
.@transformerlab has integrated with dstack!
Transformer Lab is an agent friendly ML research platform for training models with modern experiment tracking, automated hyperparameter sweeps, and persistent storage across ephemeral nodes.
With dstack, those workflows run across any GPU cloud or on-prem cluster.
Both projects are open source. Check it out 👇🏽
https://t.co/ZfceTxQd3Y
Transformer Lab now integrates with @dstackai.
Run your full research workflow in Transformer Lab orchestrating compute through dstack across GPU clouds, Kubernetes and bare-metal clusters.
⚙️ Provision GPUs via dstack's unified control plane for compute orchestration.
🧪 Track experiments seamlessly via Transformer Lab; no digging through scattered logs across clusters / environments.
📦 Checkpointing, auto-recovery, global object storage to scale experiments.
🤖 Hyperparameter sweep automation to test more model configs faster.
All open source.
Get started here: https://t.co/kUrCNlUt19
Transformer Lab “tasks” let you run complex ML workflows with a single click.
Import a task from the Task Gallery, configure your parameters, and run. Each task packages all setup and dependencies so you skip the troubleshooting.
🎬 Our Wan2.1 text-to-video task is a great example. Running it normally requires significant setup work. As a Transformer Lab task, you one-click import it, type a prompt and you're generating video.
🧪 The Task Gallery covers training, fine-tuning, evaluation and more.
🛠️ Create your own tasks and share them with your team.
💻 Runs on your local machine, an on-prem cluster or a cloud provider like @runpod.
Open source and free. Try it out: https://t.co/eQQV6dNClM
Runpod is now natively supported in @transformerlab!
Add your API key and get experiment tracking, automatic checkpointing with failure recovery, persistent artifact storage, and interactive sessions (Jupyter, VSCode, vLLM) on your Runpod GPUs.
Get started 👇
🚀 Support for Runpod is live on Transformer Lab for Teams.
Add your Runpod API key and start running workloads on Transformer Lab for Teams using Runpod instances.
What you can do:
⚡ Queue workloads to run automatically or reserve an on-demand instance with Jupyter, VSCode and vLLM on dedicated Runpod GPUs
🧪 Submit training and eval jobs with built-in experiment tracking
🔄 Automate checkpointing and failure recovery. If an instance drops, your job restarts from the last saved checkpoint
💾 Store artifacts persistently, so model weights and eval results are accessible after the Runpod instance terminates
🔗 Supports SLURM and SkyPilot so teams that use Runpod alongside on-prem clusters can manage everything from a unified interface
Get started here: https://t.co/6pwxngVtso
We launched ComfyUI as a task in Transformer Lab.
Set up Transformer Lab, pick any compute you have access to, launch the task, and you're in ComfyUI. No environment setup.
🖥️ Run on an isolated Runpod pod
🏗️ Run on your own HPC cluster
💻 Run locally
All from within Transformer Lab. Same interface, same workflow.
If you've been using pre-built templates to avoid the setup pain, this does the same thing but on whatever compute you want, including your own hardware.
Open source and free. Get started here: https://t.co/eQQV6dOabk
Looks like it’s confirmed Cursor’s new model is based on Kimi! It reinforces a couple of things:
- open-source keeps being the greatest competition enabler
- another validation for chinese open-source that is now the biggest force shaping the global AI stack
- the frontier is no longer just about who trains from scratch, but who adapts, fine-tunes, and productizes fastest (seeing the same thing with OpenClaw for example).
NVIDIA DGX support is now live in Transformer Lab.
Got an NVIDIA DGX Spark? Skip the hassle of setting up CUDA 13 and other ML libraries on your machine.
Transformer Lab handles environment setup while managing your entire workflow: training/fine-tuning/evals, tracking runs, storing datasets/checkpoints and more.
Try it out. Open source, free to use. Feedback welcome.
https://t.co/Z2ZUepWjk7
If you’re setting up an ML research cluster, we wrote this guide for you:
The Definitive Guide to Building a Machine Learning Research Cluster From Scratch:
• Technical blueprint for single “under-the-desk” GPU server to scaling university-wide cluster for 1,000+ users
• Tried and tested configurations for drivers, orchestration, storage, scheduling, and UI with a bias toward modern, simple tooling that is open source and easy to maintain.
• Step-by-step install guides (CUDA, ROCm, k3s, Rancher, SLURM/SkyPilot paths)
We’ve helped research labs of all sizes build their ML platforms with the goal to create a unified environment to unlock researchers to do their best work.
Different budgets, different constraints, but the same questions come up:
• How do we evolve from a single workstation into shared compute gracefully?
• Selecting an orchestrator / scheduler: SLURM vs. SkyPilot vs. Kubernetes vs. Others?
• What storage approach won’t collapse once data + users grow?
• How do we avoid building a fragile set of scripts that are hard to maintain?
Read the full guide on GitHub (PRs/issues welcome): https://t.co/uiS09eDToF
Today https://t.co/jFknDoasSy joins Hugging Face
Together we will continue to build ggml, make llama.cpp more accessible and empower the open-source community. Our joint mission is to make local AI easy and efficient to use by everyone on their own hardware.
ML research teams have been stuck with outdated tooling for far too long.
Excited to see @TransformerLab launch Transformer Lab for Teams — bringing open, modern infrastructure to real research workflows.
Mozilla Ventures is proud to support this team.
@jameswlepage Excited for you to try it out. Join our Discord to chat with our engineerings if you need help or have questions. https://t.co/xBI0E3rOsn