Jay Chia - eventual.ai

8 days ago

First-class observability in Daft. Operators, Tasks, Rows, Memory are all surfaced in a dashboard that ships with the install. + OTel endpoints for your existing collector. + Stuck detection. + DAFT_TRACE for console debugging. ~45 PRs across the observability stack. https://t.co/HWyjBiaePN

everettkleven's tweet photo. First-class observability in Daft.

Operators, Tasks, Rows, Memory are all surfaced in a dashboard that ships with the install.

+ OTel endpoints for your existing collector.
+ Stuck detection.
+ DAFT_TRACE for console debugging.

~45 PRs across the observability stack.

https://t.co/HWyjBiaePN

1

11

4

1

641

jaychia_ retweeted

14 days ago

🚢 Daft v0.7.14 has shipped Parquet reader rewrite — up to 17x faster remote reads Streaming distributed limits Native UUIDv7 generation JSON array/object functions https://t.co/tHYeox55iY

1

9

3

2

398

jaychia_ retweeted

@Gradient_AI_ Democratizing AI. Former Quant. 🧑‍🍳 models and build agents. https://t.co/ZC0c6oBk3S

16 days ago

Three Daft releases in four days. v0.7.11 — Arrow PyCapsule, streaming ASOF joins, Iceberg idempotent commits. v0.7.12 — Iceberg table properties + extension macro revert. v0.7.13 — Forward ASOF joins. Upgrade straight to v0.7.13. https://t.co/NOm8HXHiWF

1

5

3

0

236

Who to follow

Mark Huang

@markatgradient

LanceDB

@lancedb

Developer-friendly, open source AI-Native Multimodal Lakehouse https://t.co/wXn4tw5ySn

Ivan Nardini

@ivnardini

AI/ML Advocate @googlecloud | Vertex AI dude | Research, Open Models, Ray & TPU | Instructor @DeepLearningAI | Startup Advisor @ycombinator x Google Cloud

19 days ago

Truly enduring databases don’t come around very often, but I have a feeling that @turbopuffer might be one of them :)

Sualeh Asif

@sualehasif996

21 days ago

A conversation with @sirupsen on scaling Shopify, building turbopuffer, and the future of databases. 0:00 - Scaling Shopify through flash sales and outages 8:13 - How top infrastructure teams collaborated in the 2010s 10:35 - Engineering principles from Logrus and on-call 17:38 - The story behind Simon’s famous-ish blog, Napkin Math 23:05 - Why new database companies keep winning 32:21 - How Simon became a fan of databases 35:45 - AI coding, and where agents still fail 42:10 - Hiring P99 engineers in the AI era 48:45 - What’s next for databases

18

306

28

249

68K

0

11

0

2

2K

20 days ago

WAMs :)))

0

25

21 days ago

This weeks Physical AI Newsletter is packed with updates. Definitely check out the survey on World Action Models. Not only does it clarify the differences between VLAs, World Models, and World Action Models, but it also contextualizes the algorithm and training strategies for all of the models being released.

2

3

0

92

21 days ago

This weeks post: https://t.co/FHCQJ4m5Zl

0

21

jaychia_ retweeted

21 days ago

Daft now has native distributed ASOF joins. And it scales horizontally without data skew. https://t.co/lZuORgnuFU

1

8

4

3

337

jaychia_ retweeted

22 days ago

daft.VideoFile is perfect for Physical AI. Open X-Embodiment aggregates over a million episodes. DROID alone runs 350+ hours of multi-camera 60fps footage. That's hundreds of millions of frames across a single dataset, and most action-model training doesn't need them all. - read_video_frames — filter on keyframes; supports S3, GCS, & YouTube URLs. - video_metadata — resolution, fps, duration, frame count from file headers. - video_frames(start_time, end_time) — decode a 10-second window from a 90-minute file. Frames land as Image columns in the same DataFrame. Feed them to a vision model, compute embeddings, and write to Iceberg. Check out the blog https://t.co/Ucn3SzF12g

everettkleven's tweet photo. daft.VideoFile is perfect for Physical AI.

Open X-Embodiment aggregates over a million episodes. DROID alone runs 350+ hours of multi-camera 60fps footage. That's hundreds of millions of frames across a single dataset, and most action-model training doesn't need them all.

- read_video_frames — filter on keyframes; supports S3, GCS, & YouTube URLs.
- video_metadata — resolution, fps, duration, frame count from file headers.
- video_frames(start_time, end_time) — decode a 10-second window from a 90-minute file.

Frames land as Image columns in the same DataFrame.
Feed them to a vision model, compute embeddings, and write to Iceberg.

Check out the blog
https://t.co/Ucn3SzF12g

0

5

3

0

149

jaychia_ retweeted

Alex Shan

@alexshander03

23 days ago

We’re launching @JudgmentLabs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.

212

1K

157

355

4M

28 days ago

@danimberman @ApacheAirflow @kubernetesio Episode goes live in 2 weeks! In the meantime, check out our previous episodes: https://t.co/5DESbB3XSV

0

3

0

36

28 days ago

Probably my favorite episode yet! Just finished filming our latest episode of Zero Shot Espresso with @danimberman who is an @ApacheAirflow PMC, developed the @kubernetesio executor, and now helps technical teams ship production AI as a consultant.

jaychia_'s tweet photo. Probably my favorite episode yet!

Just finished filming our latest episode of Zero Shot Espresso with @danimberman who is an @ApacheAirflow PMC, developed the @kubernetesio executor, and now helps technical teams ship production AI as a consultant. https://t.co/Kugq21wP9E

1

3

2

0

138

28 days ago

@danimberman @ApacheAirflow @kubernetesio We chatted about how open-source software is changing in the AI-era, what it's like running a solo-consulting business, and the biggest difference between senior and principal engineers.

jaychia_'s tweet photo. @danimberman @ApacheAirflow @kubernetesio We chatted about how open-source software is changing in the AI-era, what it's like running a solo-consulting business, and the biggest difference between senior and principal engineers. https://t.co/LvtLkCu7Oe

1

2

0

72

29 days ago

Yes come join eventual. We have cookies. And really powerful GPUs.

Brian LaManna

@BrianLaManna_

about 1 month ago

Companies I'd consider going to if I ever had the urge to try something new. 1 Thinking Machines Lab 2 OpenAI 3 Anthropic 4 Cursor 5 Applied Intuition 6 Modal Labs 7 Decagon 8 Voyage AI 9 Cohere 10 Glean 11 LangChain 12 Ramp 13 Together AI 14 Fireworks AI 15 Cognition 16 Harvey 17 Scale AI 18 Warp 19 Hebbia 20 Rogo 21 Augment 22 Parallel Web Systems 23 Baseten 24 Brain Co. 25 Linear 26 Mercor 27 Mistral AI 28 Nuro 29 Adept 30 Vanta 31 Traversal 32 Metronome 33 ElevenLabs 34 Factory 35 Anyscale 36 Vannevar Labs 37 Abridge 38 The Browser Company 39 Reevo 40 Chalk 41 Nominal 42 Cartesia 43 Pinecone 44 Hex Technologies 45 Merge 46 Whatnot 47 Eventual 48 Faire 49 Arena 50 Bedrock Robotics List courtesy of Paraform's Talent Density Rankings. https://t.co/IvPW3sir9U

BrianLaManna_'s tweet photo. Companies I'd consider going to if I ever had the urge to try something new.

1 Thinking Machines Lab
2 OpenAI
3 Anthropic
4 Cursor
5 Applied Intuition
6 Modal Labs
7 Decagon
8 Voyage AI
9 Cohere
10 Glean
11 LangChain
12 Ramp
13 Together AI
14 Fireworks AI
15 Cognition
16 Harvey
17 Scale AI
18 Warp
19 Hebbia
20 Rogo
21 Augment
22 Parallel Web Systems
23 Baseten
24 Brain Co.
25 Linear
26 Mercor
27 Mistral AI
28 Nuro
29 Adept
30 Vanta
31 Traversal
32 Metronome
33 ElevenLabs
34 Factory
35 Anyscale
36 Vannevar Labs
37 Abridge
38 The Browser Company
39 Reevo
40 Chalk
41 Nominal
42 Cartesia
43 Pinecone
44 Hex Technologies
45 Merge
46 Whatnot
47 Eventual
48 Faire
49 Arena
50 Bedrock Robotics

List courtesy of Paraform's Talent Density Rankings.

https://t.co/IvPW3sir9U

72

2K

76

3K

565K

0

1

0

203

jaychia_ retweeted

30 days ago

🚢 Daft v0.7.10 30 contributors (a release record!) 41 new features and functions. Distributed as_of joins, SimHash dedupe, temporal arithmetic, C++ extensions. https://t.co/Blit46bYww

everettkleven's tweet photo. 🚢 Daft v0.7.10

30 contributors (a release record!)
41 new features and functions.

Distributed as_of joins, SimHash dedupe, temporal arithmetic, C++ extensions.

https://t.co/Blit46bYww https://t.co/cSFytT2Hr4

2

4

3

0

193

jaychia_ retweeted

Sammy Sidhu

@Sammy_Sidhu

about 1 month ago

The fastest H3 geospatial indexing in Daft wasn't written by the Daft team. Developed by Garrett Weaver, daft-h3 runs 3–16x faster than simply wrapping h3-py in a Python UDF. That speed up is thanks to Daft's Native Extensions powered by Apache Arrow's C Data Interface.

Sammy_Sidhu's tweet photo. The fastest H3 geospatial indexing in Daft wasn't written by the Daft team.

Developed by Garrett Weaver, daft-h3 runs 3–16x faster than simply wrapping h3-py in a Python UDF. That speed up is thanks to Daft's Native Extensions powered by Apache Arrow's C Data Interface. https://t.co/XPgqJNLnLt

1

5

3

4

225

about 1 month ago

Working with video data... Send help. Guys I don't think big data is dead.

0

3

0

43

about 1 month ago

The pace of multimodal AI is actually crazy right now I think this is it I’ve been crying wolf for 4 straight years but I think it’s coming for real now. We’re about to see that ChatGPT moment very, very soon.

0

1

0

75

jaychia_ retweeted

about 1 month ago

Most image embedding pipelines are actually two pipelines stitched together. Script one: PySpark reads images from S3, resizes them, joins with metadata, writes to Delta Lake. Script two: PyTorch loads ResNet, generates embeddings on GPU, writes back to Delta Lake. Two frameworks. Two sets of dependencies. Two GPU configs. Serialization overhead at every boundary. With Daft, it's one script. download → resize → join → embed → write. daft.cls handles GPU placement and batching. No handoff.

everettkleven's tweet photo. Most image embedding pipelines are actually two pipelines stitched together.

Script one: PySpark reads images from S3, resizes them, joins with metadata, writes to Delta Lake.
Script two: PyTorch loads ResNet, generates embeddings on GPU, writes back to Delta Lake.

Two frameworks. Two sets of dependencies. Two GPU configs. Serialization overhead at every boundary.

With Daft, it's one script. download → resize → join → embed → write. daft.cls handles GPU placement and batching. No handoff.

1

5

3

0

170