One of Europe's largest @nvidia GB300 NVL72 deployments is coming online, and @MistralAI has adopted the VAST AI Operating System as part of its data foundation.
More on the partnership: https://t.co/C0wLn7zctN
Today, VAST Data announced our Series F at a $30 billion valuation.
This milestone reflects accelerating demand for a new data infrastructure stack purpose-built for AI.
Learn more: https://t.co/NIVRolXQEZ
VAST FWD will feature insights from NVIDIA Founder and CEO Jensen Huang on NVIDIA’s journey building AI infrastructure in collaboration with VAST. From training to inference to agent-based systems, Jensen outlines how enterprises are putting AI to work across their organizations.
In the blink of an eye, AI storage explodes in capacity by 12,300% (see math below). This week, NVIDIA introduced a massive unlock to GPU efficiency: a new specialized AI storage architecture that extends context/tokens that are processed in HBM - and can now spill context down into shared NVMe storage. By saving context in a KV Cache, inference systems avoid the cost of context recomputing (for large context inference), lowering time-to-first-token by 20x or more.
What people don't realize is that this is an altogether new data generator - and not only does the market need a new approach to storage speed and efficiency, but many (regulated) AI labs will still need enterprise data management capability which cannot be sacrificed for raw speed.
NVIDIA calls this Inference Context Memory Storage (ICMS) Platform. We've been working with them for weeks now to pioneer a new way to configure VAST systems that provides ultimate efficiency, by embedding the core logic of VAST systems directly into a GPU machines BlueField DPU.
**The 12x is no joke. I did the math today **
- A standard VAST system, minimally configured for a NCP (NVIDIA Cloud Partner), has roughly 1.3TB of data per every GPU in a GB200-class cluster.
- When we add additional infrastructure for context memory extension, GPUs will require an additional 16TB as we step into the Vera Rubin era. 12.3x.
Why @VAST_Data , you might ask?
1. our parallel DASE architecture allows us to embed VAST servers directly into each BlueField server. This not only reduces infrastructure requirements vs. conventional configurations where separate x86 servers were shared by GPU clients, it also changes the fundamental client:server paradigm... where for the first time every GPU client machine now has their own dedicated server. VAST's parallel Disaggregated, Shared-Everything architecture makes it possible to embed servers in each client without introducing cross-talk across VAST servers as would be the case for any other storage technology.
Each server then connects directly to all of the cluster's SSDs, requiring a single zero-copy hop to get to all of the shared context- so any machine can retrieve context in real-time. The efficiency and scale of this architecture is unprecedented.
2. While we can get great performance by stripping down data services that run In BlueField, our embarrassingly-parallel architecture allows us to hang additional servers off the same fabric to provide optional background enterprise data management... bringing capabilities such as data protection, audit, encryption and up to 2:1 KVCache data reduction to a cluster that has an ultra-streamlined data path to the GPU.
With VAST, AI labs don't have to choose...
They can get performance and killer global data management features.
This space is evolving right now... lots of room to invent.
DM me to co-develop the future of accelerated inference systems with us.
https://t.co/BNxhiYD8ZO
Inverse correlation: The bigger the frontier model training job, the less I/O you need per GPU. This is one of the counter-intuitive learnings that @glennklockwood teases out after analyzing nearly 100,000 checkpoint operations on frontier model training systems.
@VAST_Data is bringing all the maths to help you all appreciate the requirements of running AI training infrastructure at extreme scale. Read more here: https://t.co/hZhDZF2IdF
@EFIEBER_ANDRE Beeindruckend finde ich eher BYDs Mut, ein Fahrzeug zu bauen, dessen Akku bei Abruf der kompletten Motorleistung in weniger als 3 Minuten leer ist.
80kWh / ~2200kW = 0,036h = 2 Min. 9,6 Sek.
@EFIEBER_ANDRE Beeindruckend finde ich eher BYDs Mut, ein Fahrzeug zu bauen, dessen Akku bei Abruf der kompletten Motorleistung in weniger als 3 Minuten leer ist.
80kWh / ~2200kW = 0,036h = 2 Min. 9,6 Sek.
@EFIEBER_ANDRE Tesla benötigt für FSD und Optimus vor allem eins: Daten und Rechenleistung. Kein anderer Auto- oder Robottikhersteller hat Zugriff auf eine Rechenleistung, die xAI heute schon zur Verfügung hat mit ihren 230.000 GPUs. Daraus erwächst ein Wettbewerbsvorteil für Tesla.
@EFIEBER_ANDRE In Gießen machte ich die beste Serviceerfahrung meines Lebens. Keinerlei Wartezeiten, alles tipptopp verlaufen.
Und als jemand, der in seinem Leben bereits mehr als 14 Firmenwagen unterschiedlicher Marken verschlissen hat, weiß ich, wovon ich rede.
Every enterprise is racing to adopt AI. Few are ready for what it actually demands.
AI at scale breaks legacy systems. It floods storage. It overwhelms compute. It creates trillions of agents that need real-time context and global coordination.
That’s why VAST built an entirely new operating system. The VAST AI OS is the world’s first platform built to power the agentic AI era — unifying exabyte-scale data, millions of GPUs, and intelligent compute from edge to cloud.
AI needed an operating system. So VAST Data built one.
This video is just the start. Discover how the VAST AI OS brings agentic computing to life: https://t.co/J8Slo022er
I would argue that we're 100x more relevant in the age of scalable inference. We've got customers gearing up to deploy 10Ms of agents. They need dynamic and scalable access to all data (structured & unstructured), in real-time, with security, with QOS and global access. These are all hallmarks of @VAST_Data and black spots on the records of legacy players.
@Chris_Mellor (2) It is then difficult to prevent an attacker from logging into the object storage system as a say storage administrator and deleting entire buckets. The end result stays the same more or less.
@Chris_Mellor (1) While your assumption seems correct (supposing object versioning), the ransomware attack is not only aimed at encrypting data. Rather, it attempts to gain control of the network (credentials) and simply delete backups and other data that cannot be encrypted.
We're pumped to announce our first open source project🔹 VUA🔹 . VAST Data's Undivided Attention is our approach to giving AI agents infinite memory by extending tools like vLLM and NVIDIA Dynamo with a third tier of shared (undivided) context (attention).
The objective here is to lower time-to-first-token by giving AI machines much larger cache spaces. In our testing, VUA can lower the time to token by as much as 75%%, saving precious GPU time and enhancing the application experience. By extending an AI model's memory space to petabytes (or more) of kvcache, organizations can:
- affordably deploy models with super-large context windows... think terabytes of model memory (like Llama 4, which can sport up to 5TB of context data!!)
- support multi-turn inference sessions that bounce around GPU machines over time... VUA makes sure you never have to re-compute a session history (compute time otherwise scales quadratically as the context length grows... mucho expensivo).
🔹 Why open source? 🔹
We're a big believer in standard interfaces. When we find standards we like, we help improve them so that access is always standardized (past examples... NFSoRDMA, NVMe/Fabrics). Now with this extension to standard inference frameworks, we're hoping popular tools like vLLM and others just adopt this SW for the benefit of the whole industry.
🔹 What does it work on? Most Everything! (but VAST is best 😘)🔹
VUA can work on any NFS endpoint, but VAST's Data Platform brings some real advantage to the table:
- a parallel NFS architecture that can handle any level of metadata intensity (which is important for prefix-based search)
- RDMA support for NFS (and soon S3) that makes writing and reading most optimized
- NFS and S3 lifecycle policies to help customers manage capacity and enable the system to delete stale KV caches automatically. KV caches can throw off terabytes of data per GPU per day, so you need intelligent and simple data management mechanisms to ensure that cost doesn't run away.
To download the code, visit the ol' VAST Data Github page:
https://t.co/RWpCAjFzxI
To read more about VUA, check out the blog written by Dave Graham, @Dan Aloni, Alon Horev and Matthew Rogers here: https://t.co/5xf4spMQDO
Today, I want to call special attention to a new @VAST_Data Platform capability that lays testament to the awesome power of VAST's Disaggregated, Shared Everything (DASE) architecture.
Almost every data lake, data warehouse and event streaming platform is built upon the shared-nothing architecture popularized by the Google File System in 2003. The east-west traffic created by shared-nothing clusters has made transactional I/O nearly impossible to scale... internal communication overwhelms clusters as they scale. This has created data silos in data pipelines... where events are first captured in a transactional bus and then shipped to some data lake/house. Silos = lost analytics opportunities.
VAST's new embarrassingly-parallel architecture eliminates east-west traffic, therefore making it possible to ingest real-time events, database transactions, cross-table change data capture at any level of scale... add servers, scale data services.
Today we bring an end to compromise.
- Stream real-time events directly into your data warehouse, at up to 100s of millions of events per second
- Topics are stored as tables, where new tabular data is columnarized once it rolls off the VAST Data Platform write buffer
- Events can be analyzed and correlated against all of the tables in the VAST DataBase using database sorts and indexes... giving customers a unified analytics environment that eliminates the divide between real-time and batch data analytics
Today, we essentially eliminate the need for a event streaming platform because VAST's DataBase handles streaming and analytics on the same data at any scale. We're not cutting into a new market category, we're eliminating the category altogether.
https://t.co/RpPOxKa2uU