Inference isn't everything, but it does require a new stack -- not Kubernetes, not SLURM.
At @modal, we dove deep to build that stack.
In this blog post we explain how, from compute management & cloud-native cacheing to CRIU & GPU checkpointing.
https://t.co/DQ4wvuXjre
Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds.
In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests.
Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.
The rise of AI workloads - especially large language models - introduces new use cases for CRIU. The article (https://t.co/pXJW8JviU1) demonstrates how effectively this can be applied to GPU workloads. #criu#nvidia#LLM
CRIU 4.1 "CRISCV" is here with RISC-V, PIDFD support, plus ARM64 PACs, CUDA enhancements, and more! Check out the details: https://t.co/4KKrCOS1w7 #CRIU#Linux
CRIU 4.0 (CRIUDA) is out!
This update introduces groundbreaking functionality for checkpointing and restoring NVIDIA CUDA workloads. Explore the complete changelog and download the latest version at:
https://t.co/UOgr5nwueP
#CRIU#NVIDIA#CUDA
@confusedqubit@PraveenPerera@pojntfx@mycoliza Regarding CRIU, do you use only the post-copy variant? Have you considered evaluating the pre-copy (pre-dump) approach? We welcome patches at any time!
Weโre happy to announce that registration for LPC 2023 is now open. To register please go to our attend page.
To try to prevent the instant sellout, weโve updated our cancellation policy to no refunds only transfers of registrations.
https://t.co/HRXEwJoRkA
#LinuxPlumbers
@ebiken Thanks for that link. According to the video, Microsoft worked with AMD and Nvidia to implement CRIU support. Upstream we are not aware of any work with Nvidia GPUs. We only know about AMDs GPU support:
https://t.co/2r5UJsLBMC
๐ฌ Forensic container analysis
In #Kubernetes, it is possible to create a checkpoint of a running container without stopping it and without it knowing
@adrian__reber describes how to analyze a checkpoint using tools like checkpointctl, tar, crit and gdb
https://t.co/lMino5kb4h
I've known about @__criu__ since 2015 from LXC days, now I had a chance to use it with @Podman_io and it feels like a witchcraft. Lightweight alternative to Firecracker snapshots.
We have a special project with CRIU at @resmoio, to be released in March. Very excited about it!