Daft @daftengine - Twitter Profile

Pinned Tweet

6 months ago

.@SourcetableApp CTO @andrewgrosser shares his recommended tech stack for serious startups - a "wicked combination" that includes: - S3 + Cassandra for data - Daft for processing - Python, WASM, Ray Learn how they built the first AI-powered spreadsheet: https://t.co/ZX8fYQzfwt

2

10

1

5

1K

daftengine retweeted

Everett Kleven

@everettkleven

7 days ago

First-class observability in Daft. Operators, Tasks, Rows, Memory are all surfaced in a dashboard that ships with the install. + OTel endpoints for your existing collector. + Stuck detection. + DAFT_TRACE for console debugging. ~45 PRs across the observability stack. https://t.co/HWyjBiaePN

everettkleven's tweet photo. First-class observability in Daft.

Operators, Tasks, Rows, Memory are all surfaced in a dashboard that ships with the install.

+ OTel endpoints for your existing collector.
+ Stuck detection.
+ DAFT_TRACE for console debugging.

~45 PRs across the observability stack.

https://t.co/HWyjBiaePN

1

11

4

1

646

daftengine retweeted

Everett Kleven

@everettkleven

14 days ago

🚢 Daft v0.7.14 has shipped Parquet reader rewrite — up to 17x faster remote reads Streaming distributed limits Native UUIDv7 generation JSON array/object functions https://t.co/tHYeox55iY

1

9

3

2

403

daftengine retweeted

Everett Kleven

@everettkleven

15 days ago

Three Daft releases in four days. v0.7.11 — Arrow PyCapsule, streaming ASOF joins, Iceberg idempotent commits. v0.7.12 — Iceberg table properties + extension macro revert. v0.7.13 — Forward ASOF joins. Upgrade straight to v0.7.13. https://t.co/NOm8HXHiWF

1

5

3

0

239

Who to follow

Vaish Srivathsan

@vatsan11

Product @theworldlabs

Sam Liang

@Goopt

CEO/Co-Founder @otter_ai, Share, remember, search, playback all your meetings & improve team collaboration. Ex @Google. @Stanford.

Lin Qiao

@lqiao

Cofounder and CEO of @FireworksAI_HQ

daftengine retweeted

Everett Kleven

@everettkleven

20 days ago

Daft now has native distributed ASOF joins. And it scales horizontally without data skew. https://t.co/lZuORgnuFU

1

8

4

3

339

daftengine retweeted

Everett Kleven

@everettkleven

21 days ago

daft.VideoFile is perfect for Physical AI. Open X-Embodiment aggregates over a million episodes. DROID alone runs 350+ hours of multi-camera 60fps footage. That's hundreds of millions of frames across a single dataset, and most action-model training doesn't need them all. - read_video_frames — filter on keyframes; supports S3, GCS, & YouTube URLs. - video_metadata — resolution, fps, duration, frame count from file headers. - video_frames(start_time, end_time) — decode a 10-second window from a 90-minute file. Frames land as Image columns in the same DataFrame. Feed them to a vision model, compute embeddings, and write to Iceberg. Check out the blog https://t.co/Ucn3SzF12g

everettkleven's tweet photo. daft.VideoFile is perfect for Physical AI.

Open X-Embodiment aggregates over a million episodes. DROID alone runs 350+ hours of multi-camera 60fps footage. That's hundreds of millions of frames across a single dataset, and most action-model training doesn't need them all.

- read_video_frames — filter on keyframes; supports S3, GCS, & YouTube URLs.
- video_metadata — resolution, fps, duration, frame count from file headers.
- video_frames(start_time, end_time) — decode a 10-second window from a 90-minute file.

Frames land as Image columns in the same DataFrame.
Feed them to a vision model, compute embeddings, and write to Iceberg.

Check out the blog
https://t.co/Ucn3SzF12g

0

5

3

0

151

Daft

@daftengine

26 days ago

WAM!!!

Everett Kleven

@everettkleven

26 days ago

VLAs are dead, long live World Action Models So declares @DrJimFan, the most credible researcher in robotics today. https://t.co/UOFvpoz41l 👆We just published a short blog where @ykdojo breaks down the video. It certainly helped me correct my mental model.

4

14

4

10

5K

0

2

0

210

daftengine retweeted

Everett Kleven

@everettkleven

27 days ago

So turns out I'm not the only one who builds on @daftengine 😆 In fact, theres a TON of projects that leverage daft natively to power their AI & data processing. Daft is the Data Engine for AI. > I say it because its true. > I keep saying it because the Daft community keeps giving back! Check out all these projects! (link in the comments)

everettkleven's tweet photo. So turns out I'm not the only one who builds on @daftengine 😆

In fact, theres a TON of projects that leverage daft natively to power their AI & data processing.

Daft is the Data Engine for AI.

> I say it because its true.
> I keep saying it because the Daft community keeps giving back!

Check out all these projects! (link in the comments)

1

5

1

3

477

daftengine retweeted

Jay Chia - eventual.ai

@jaychia_

28 days ago

Probably my favorite episode yet! Just finished filming our latest episode of Zero Shot Espresso with @danimberman who is an @ApacheAirflow PMC, developed the @kubernetesio executor, and now helps technical teams ship production AI as a consultant.

jaychia_'s tweet photo. Probably my favorite episode yet!

Just finished filming our latest episode of Zero Shot Espresso with @danimberman who is an @ApacheAirflow PMC, developed the @kubernetesio executor, and now helps technical teams ship production AI as a consultant. https://t.co/Kugq21wP9E

1

3

2

0

138

daftengine retweeted

Everett Kleven

@everettkleven

29 days ago

🚢 Daft v0.7.10 30 contributors (a release record!) 41 new features and functions. Distributed as_of joins, SimHash dedupe, temporal arithmetic, C++ extensions. https://t.co/Blit46bYww

everettkleven's tweet photo. 🚢 Daft v0.7.10

30 contributors (a release record!)
41 new features and functions.

Distributed as_of joins, SimHash dedupe, temporal arithmetic, C++ extensions.

https://t.co/Blit46bYww https://t.co/cSFytT2Hr4

2

4

3

0

193

daftengine retweeted

Sammy Sidhu

@Sammy_Sidhu

about 1 month ago

The fastest H3 geospatial indexing in Daft wasn't written by the Daft team. Developed by Garrett Weaver, daft-h3 runs 3–16x faster than simply wrapping h3-py in a Python UDF. That speed up is thanks to Daft's Native Extensions powered by Apache Arrow's C Data Interface.

Sammy_Sidhu's tweet photo. The fastest H3 geospatial indexing in Daft wasn't written by the Daft team.

Developed by Garrett Weaver, daft-h3 runs 3–16x faster than simply wrapping h3-py in a Python UDF. That speed up is thanks to Daft's Native Extensions powered by Apache Arrow's C Data Interface. https://t.co/XPgqJNLnLt

1

5

3

4

225

daftengine retweeted

Everett Kleven

@everettkleven

about 1 month ago

Most image embedding pipelines are actually two pipelines stitched together. Script one: PySpark reads images from S3, resizes them, joins with metadata, writes to Delta Lake. Script two: PyTorch loads ResNet, generates embeddings on GPU, writes back to Delta Lake. Two frameworks. Two sets of dependencies. Two GPU configs. Serialization overhead at every boundary. With Daft, it's one script. download → resize → join → embed → write. daft.cls handles GPU placement and batching. No handoff.

everettkleven's tweet photo. Most image embedding pipelines are actually two pipelines stitched together.

Script one: PySpark reads images from S3, resizes them, joins with metadata, writes to Delta Lake.
Script two: PyTorch loads ResNet, generates embeddings on GPU, writes back to Delta Lake.

Two frameworks. Two sets of dependencies. Two GPU configs. Serialization overhead at every boundary.

With Daft, it's one script. download → resize → join → embed → write. daft.cls handles GPU placement and batching. No handoff.

1

5

3

0

170

daftengine retweeted

Brittany

@brittwalker_

about 1 month ago

Proud but not surprised to see @CRV portco @daftengine punching above their weight 🤗https://t.co/scLXq8YoSs

2

20

2

2K

daftengine retweeted

Sammy Sidhu

@Sammy_Sidhu

about 1 month ago

Eventual was ranked #47 globally on Paraform’s Talent Density Index. What I liked most about this wasn’t the ranking itself, but how they define it: not by who looks impressive on paper, but by who’s actually developing people the market is fighting for. A friend put it better than I could: “Honestly, it’s a testament to the talent you’re recruiting and fostering.” Feels right. Grateful to be building alongside this team. https://t.co/86OlaDaTMf

Sammy_Sidhu's tweet photo. Eventual was ranked #47 globally on Paraform’s Talent Density Index.

What I liked most about this wasn’t the ranking itself, but how they define it: not by who looks impressive on paper, but by who’s actually developing people the market is fighting for.

A friend put it better than I could:
“Honestly, it’s a testament to the talent you’re recruiting and fostering.”

Feels right.

Grateful to be building alongside this team.
https://t.co/86OlaDaTMf

1

7

4

0

241

daftengine retweeted

Everett Kleven

@everettkleven

about 1 month ago

daft.File is lazy — Nothing opens until a UDF calls .open() or .to_tempfile(). Filter millions of files by path and MIME type. Then open only the survivors. Markdown, PDFs, code, audio, video — same interface. https://t.co/tH8KmVavT1

1

4

1

2

227

daftengine retweeted

Quentin Lhoest 🤗 @lhoestq

about 1 month ago

Highly recommend AI data people to follow @daftengine, these guys are cooking 👀

2

5

3

0

400

daftengine retweeted

0 @rev_proxy

about 2 months ago

been lurking into data/ai/ml stuff lately and came across @daftengine , pretty cool ngl. distributed query engine for running data + AI workloads at scale (text, images, embeddings, all of it). turns messy data into structured outputs without a ton of infra glue. plus it’s open source, which makes it even better. might try building something cool with it @Sammy_Sidhu

0

3

1

0

223

daftengine retweeted

Enrico Shippole @EnricoShippole

about 2 months ago

We @TeraflopAI have worked together with @johngfriedman and @daftengine to open-sourced all major filings from SEC EDGAR completely for free on @huggingface. It is now more important than ever to push for open dataset releases.

5

63

18

24

32K

daftengine retweeted

Everett Kleven

@everettkleven

about 2 months ago

Daft v0.7.9. 8 new temporal functions for Spark-compatible date arithmetic. video_frames() for column-level video decoding. Native UUID type. Plus byte-level dashboard observability and initial ASOF join support.

everettkleven's tweet photo. Daft v0.7.9.

8 new temporal functions for Spark-compatible date arithmetic. video_frames() for column-level video decoding. Native UUID type.

Plus byte-level dashboard observability and initial ASOF join support. https://t.co/ZfPGr15ujC

1

7

4

3

387

daftengine retweeted

Everett Kleven

@everettkleven

about 2 months ago

8 million SEC filings. 43 billion tokens. 590 GB spanning 20 years of corporate financial data. Processed on 12 cores in under 24 hours for $1.10. @EnricoShippole, @TeraflopAI, and @daftengine open-sourced the full dataset on Hugging Face.

everettkleven's tweet photo. 8 million SEC filings. 43 billion tokens. 590 GB spanning 20 years of corporate financial data.

Processed on 12 cores in under 24 hours for $1.10.

@EnricoShippole, @TeraflopAI, and @daftengine open-sourced the full dataset on Hugging Face. https://t.co/Qj37sgbIvV

1

10

6

3

593

daftengine retweeted

Sammy Sidhu

@Sammy_Sidhu

about 2 months ago

“Daft is a distributed query engine that’s going to be replacing Spark.” I remember this talk like it was yesterday 👇️ Two years ago @ SF Systems Meetup, @colin_ho99 and @desmondcheongzx walked up and floored the crowd, debuting v1 of Swordfish - our local execution engine. Even at the time, Daft demonstrated dramatically lower memory than Spark on TPCH. 17 months later we've delivered multiple iterations of Swordfish and Flotilla (distributed) with compounding adoption across top labs and startups.

1

7

2

1

631

Daft

@daftengine

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users