Martin

20 days ago

@SawyerMerritt So you can still annoy your neighborhood? Great. 😬

1

0

646

ScaleOut_Dude retweeted

Harrison Ford

@HarrisonFordLA

about 1 month ago

May the fourth be with you

3K

220K

51K

6K

7M

AI winners won't be decided by models; They'll be decided by data.

about 1 month ago

@sakacc @slack Same here for weeks. 🫩

0

8

Who to follow

Boaz Palgi

@bpalgi

ScaleOut_Dude retweeted

VΛST Data @VAST_Data

about 2 months ago

Today, VAST Data announced our Series F at a $30 billion valuation. This milestone reflects accelerating demand for a new data infrastructure stack purpose-built for AI. Learn more: https://t.co/NIVRolXQEZ

1

24

5

2

10K

ScaleOut_Dude retweeted

VΛST Data @VAST_Data

4 months ago

VAST FWD will feature insights from NVIDIA Founder and CEO Jensen Huang on NVIDIA’s journey building AI infrastructure in collaboration with VAST. From training to inference to agent-based systems, Jensen outlines how enterprises are putting AI to work across their organizations.

VAST_Data's tweet photo. VAST FWD will feature insights from NVIDIA Founder and CEO Jensen Huang on NVIDIA’s journey building AI infrastructure in collaboration with VAST. From training to inference to agent-based systems, Jensen outlines how enterprises are putting AI to work across their organizations. https://t.co/W6zOS7Z6YR

1

9

3

0

3K

ScaleOut_Dude retweeted

5 months ago

In the blink of an eye, AI storage explodes in capacity by 12,300% (see math below). This week, NVIDIA introduced a massive unlock to GPU efficiency: a new specialized AI storage architecture that extends context/tokens that are processed in HBM - and can now spill context down into shared NVMe storage. By saving context in a KV Cache, inference systems avoid the cost of context recomputing (for large context inference), lowering time-to-first-token by 20x or more. What people don't realize is that this is an altogether new data generator - and not only does the market need a new approach to storage speed and efficiency, but many (regulated) AI labs will still need enterprise data management capability which cannot be sacrificed for raw speed. NVIDIA calls this Inference Context Memory Storage (ICMS) Platform. We've been working with them for weeks now to pioneer a new way to configure VAST systems that provides ultimate efficiency, by embedding the core logic of VAST systems directly into a GPU machines BlueField DPU. **The 12x is no joke. I did the math today ** - A standard VAST system, minimally configured for a NCP (NVIDIA Cloud Partner), has roughly 1.3TB of data per every GPU in a GB200-class cluster. - When we add additional infrastructure for context memory extension, GPUs will require an additional 16TB as we step into the Vera Rubin era. 12.3x. Why @VAST_Data , you might ask? 1. our parallel DASE architecture allows us to embed VAST servers directly into each BlueField server. This not only reduces infrastructure requirements vs. conventional configurations where separate x86 servers were shared by GPU clients, it also changes the fundamental client:server paradigm... where for the first time every GPU client machine now has their own dedicated server. VAST's parallel Disaggregated, Shared-Everything architecture makes it possible to embed servers in each client without introducing cross-talk across VAST servers as would be the case for any other storage technology. Each server then connects directly to all of the cluster's SSDs, requiring a single zero-copy hop to get to all of the shared context- so any machine can retrieve context in real-time. The efficiency and scale of this architecture is unprecedented. 2. While we can get great performance by stripping down data services that run In BlueField, our embarrassingly-parallel architecture allows us to hang additional servers off the same fabric to provide optional background enterprise data management... bringing capabilities such as data protection, audit, encryption and up to 2:1 KVCache data reduction to a cluster that has an ultra-streamlined data path to the GPU. With VAST, AI labs don't have to choose... They can get performance and killer global data management features. This space is evolving right now... lots of room to invent. DM me to co-develop the future of accelerated inference systems with us. https://t.co/BNxhiYD8ZO

JeffDenworth's tweet photo. In the blink of an eye, AI storage explodes in capacity by 12,300% (see math below). This week, NVIDIA introduced a massive unlock to GPU efficiency: a new specialized AI storage architecture that extends context/tokens that are processed in HBM - and can now spill context down into shared NVMe storage. By saving context in a KV Cache, inference systems avoid the cost of context recomputing (for large context inference), lowering time-to-first-token by 20x or more.

What people don't realize is that this is an altogether new data generator - and not only does the market need a new approach to storage speed and efficiency, but many (regulated) AI labs will still need enterprise data management capability which cannot be sacrificed for raw speed.

NVIDIA calls this Inference Context Memory Storage (ICMS) Platform. We've been working with them for weeks now to pioneer a new way to configure VAST systems that provides ultimate efficiency, by embedding the core logic of VAST systems directly into a GPU machines BlueField DPU.

**The 12x is no joke. I did the math today **

- A standard VAST system, minimally configured for a NCP (NVIDIA Cloud Partner), has roughly 1.3TB of data per every GPU in a GB200-class cluster.

- When we add additional infrastructure for context memory extension, GPUs will require an additional 16TB as we step into the Vera Rubin era. 12.3x.

Why @VAST_Data , you might ask?

1. our parallel DASE architecture allows us to embed VAST servers directly into each BlueField server. This not only reduces infrastructure requirements vs. conventional configurations where separate x86 servers were shared by GPU clients, it also changes the fundamental client:server paradigm... where for the first time every GPU client machine now has their own dedicated server. VAST's parallel Disaggregated, Shared-Everything architecture makes it possible to embed servers in each client without introducing cross-talk across VAST servers as would be the case for any other storage technology.

Each server then connects directly to all of the cluster's SSDs, requiring a single zero-copy hop to get to all of the shared context- so any machine can retrieve context in real-time. The efficiency and scale of this architecture is unprecedented.

2. While we can get great performance by stripping down data services that run In BlueField, our embarrassingly-parallel architecture allows us to hang additional servers off the same fabric to provide optional background enterprise data management... bringing capabilities such as data protection, audit, encryption and up to 2:1 KVCache data reduction to a cluster that has an ultra-streamlined data path to the GPU.

With VAST, AI labs don't have to choose...
They can get performance and killer global data management features.

This space is evolving right now... lots of room to invent.
DM me to co-develop the future of accelerated inference systems with us.

https://t.co/BNxhiYD8ZO

4

19

8

10

7K

6 months ago

I saw the future today. Thank you @teslaeurope for an incredible FSD experience.

0

7

ScaleOut_Dude retweeted

8 months ago

Inverse correlation: The bigger the frontier model training job, the less I/O you need per GPU. This is one of the counter-intuitive learnings that @glennklockwood teases out after analyzing nearly 100,000 checkpoint operations on frontier model training systems. @VAST_Data is bringing all the maths to help you all appreciate the requirements of running AI training infrastructure at extreme scale. Read more here: https://t.co/hZhDZF2IdF

JeffDenworth's tweet photo. Inverse correlation: The bigger the frontier model training job, the less I/O you need per GPU. This is one of the counter-intuitive learnings that @glennklockwood teases out after analyzing nearly 100,000 checkpoint operations on frontier model training systems.

@VAST_Data is bringing all the maths to help you all appreciate the requirements of running AI training infrastructure at extreme scale. Read more here: https://t.co/hZhDZF2IdF

0

3

1

1K

9 months ago

@KevinNothnick @EFIEBER_ANDRE Auf jeden Fall weniger als 3 Minuten https://t.co/r8S87QAex4

9 months ago

@EFIEBER_ANDRE Beeindruckend finde ich eher BYDs Mut, ein Fahrzeug zu bauen, dessen Akku bei Abruf der kompletten Motorleistung in weniger als 3 Minuten leer ist. 80kWh / ~2200kW = 0,036h = 2 Min. 9,6 Sek.

4

18

0

1K

1

0

241

9 months ago

@EFIEBER_ANDRE Beeindruckend finde ich eher BYDs Mut, ein Fahrzeug zu bauen, dessen Akku bei Abruf der kompletten Motorleistung in weniger als 3 Minuten leer ist. 80kWh / ~2200kW = 0,036h = 2 Min. 9,6 Sek.

4

18

0

1K

11 months ago

@EFIEBER_ANDRE Tesla benötigt für FSD und Optimus vor allem eins: Daten und Rechenleistung. Kein anderer Auto- oder Robottikhersteller hat Zugriff auf eine Rechenleistung, die xAI heute schon zur Verfügung hat mit ihren 230.000 GPUs. Daraus erwächst ein Wettbewerbsvorteil für Tesla.

1

3

0

257

11 months ago

@EFIEBER_ANDRE In Gießen machte ich die beste Serviceerfahrung meines Lebens. Keinerlei Wartezeiten, alles tipptopp verlaufen. Und als jemand, der in seinem Leben bereits mehr als 14 Firmenwagen unterschiedlicher Marken verschlissen hat, weiß ich, wovon ich rede.

0

1

0

73

12 months ago

@EFIEBER_ANDRE Ja

0

51

ScaleOut_Dude retweeted

VΛST Data @VAST_Data

12 months ago

Every enterprise is racing to adopt AI. Few are ready for what it actually demands. AI at scale breaks legacy systems. It floods storage. It overwhelms compute. It creates trillions of agents that need real-time context and global coordination. That’s why VAST built an entirely new operating system. The VAST AI OS is the world’s first platform built to power the agentic AI era — unifying exabyte-scale data, millions of GPUs, and intelligent compute from edge to cloud. AI needed an operating system. So VAST Data built one. This video is just the start. Discover how the VAST AI OS brings agentic computing to life: https://t.co/J8Slo022er

0

8

3

0

705

ScaleOut_Dude retweeted

about 1 year ago

I would argue that we're 100x more relevant in the age of scalable inference. We've got customers gearing up to deploy 10Ms of agents. They need dynamic and scalable access to all data (structured & unstructured), in real-time, with security, with QOS and global access. These are all hallmarks of @VAST_Data and black spots on the records of legacy players.

0

5

1

0

1K

about 1 year ago

@Chris_Mellor (2) It is then difficult to prevent an attacker from logging into the object storage system as a say storage administrator and deleting entire buckets. The end result stays the same more or less.

0

39

about 1 year ago

@Chris_Mellor (1) While your assumption seems correct (supposing object versioning), the ransomware attack is not only aimed at encrypting data. Rather, it attempts to gain control of the network (credentials) and simply delete backups and other data that cannot be encrypted.

0

41

ScaleOut_Dude retweeted

about 1 year ago

We're pumped to announce our first open source project🔹 VUA🔹 . VAST Data's Undivided Attention is our approach to giving AI agents infinite memory by extending tools like vLLM and NVIDIA Dynamo with a third tier of shared (undivided) context (attention). The objective here is to lower time-to-first-token by giving AI machines much larger cache spaces. In our testing, VUA can lower the time to token by as much as 75%%, saving precious GPU time and enhancing the application experience. By extending an AI model's memory space to petabytes (or more) of kvcache, organizations can: - affordably deploy models with super-large context windows... think terabytes of model memory (like Llama 4, which can sport up to 5TB of context data!!) - support multi-turn inference sessions that bounce around GPU machines over time... VUA makes sure you never have to re-compute a session history (compute time otherwise scales quadratically as the context length grows... mucho expensivo). 🔹 Why open source? 🔹 We're a big believer in standard interfaces. When we find standards we like, we help improve them so that access is always standardized (past examples... NFSoRDMA, NVMe/Fabrics). Now with this extension to standard inference frameworks, we're hoping popular tools like vLLM and others just adopt this SW for the benefit of the whole industry. 🔹 What does it work on? Most Everything! (but VAST is best 😘)🔹 VUA can work on any NFS endpoint, but VAST's Data Platform brings some real advantage to the table: - a parallel NFS architecture that can handle any level of metadata intensity (which is important for prefix-based search) - RDMA support for NFS (and soon S3) that makes writing and reading most optimized - NFS and S3 lifecycle policies to help customers manage capacity and enable the system to delete stale KV caches automatically. KV caches can throw off terabytes of data per GPU per day, so you need intelligent and simple data management mechanisms to ensure that cost doesn't run away. To download the code, visit the ol' VAST Data Github page: https://t.co/RWpCAjFzxI To read more about VUA, check out the blog written by Dave Graham, @Dan Aloni, Alon Horev and Matthew Rogers here: https://t.co/5xf4spMQDO

JeffDenworth's tweet photo. We're pumped to announce our first open source project🔹 VUA🔹 . VAST Data's Undivided Attention is our approach to giving AI agents infinite memory by extending tools like vLLM and NVIDIA Dynamo with a third tier of shared (undivided) context (attention).

The objective here is to lower time-to-first-token by giving AI machines much larger cache spaces. In our testing, VUA can lower the time to token by as much as 75%%, saving precious GPU time and enhancing the application experience. By extending an AI model's memory space to petabytes (or more) of kvcache, organizations can:

- affordably deploy models with super-large context windows... think terabytes of model memory (like Llama 4, which can sport up to 5TB of context data!!)

- support multi-turn inference sessions that bounce around GPU machines over time... VUA makes sure you never have to re-compute a session history (compute time otherwise scales quadratically as the context length grows... mucho expensivo).

🔹 Why open source? 🔹

We're a big believer in standard interfaces. When we find standards we like, we help improve them so that access is always standardized (past examples... NFSoRDMA, NVMe/Fabrics). Now with this extension to standard inference frameworks, we're hoping popular tools like vLLM and others just adopt this SW for the benefit of the whole industry.

🔹 What does it work on? Most Everything! (but VAST is best 😘)🔹

VUA can work on any NFS endpoint, but VAST's Data Platform brings some real advantage to the table:
- a parallel NFS architecture that can handle any level of metadata intensity (which is important for prefix-based search)
- RDMA support for NFS (and soon S3) that makes writing and reading most optimized
- NFS and S3 lifecycle policies to help customers manage capacity and enable the system to delete stale KV caches automatically. KV caches can throw off terabytes of data per GPU per day, so you need intelligent and simple data management mechanisms to ensure that cost doesn't run away.

To download the code, visit the ol' VAST Data Github page:

https://t.co/RWpCAjFzxI

To read more about VUA, check out the blog written by Dave Graham, @Dan Aloni, Alon Horev and Matthew Rogers here: https://t.co/5xf4spMQDO

1

15

2

541

ScaleOut_Dude retweeted