Apache Hudi @ApacheHudi - Twitter Profile

Pinned Tweet

over 1 year ago

Hudi 1.0 is the most powerful release to date for data lakehouses. Read the blog for details: Secondary Indexing, Expression Indexes, Partial Updates, Non-blocking Concurrency Control, New LSM timeline, +more: https://t.co/QHi7rnNsn9 #datalakehouse #opentableformat

0

34

10

13

3K

Apache Hudi

@apachehudi

2 days ago

Full case study on AWS Big Data Blog: https://t.co/GdkDj7joPB

0

21

Apache Hudi

@apachehudi

2 days ago

GE Aviation: 30+ source systems in production, several hundred Apache Hudi tables, 14+ months in production, 10,000+ tables in the dev pipeline. Modernizing 150+ source systems across the aviation ecosystem ↓

apachehudi's tweet photo. GE Aviation: 30+ source systems in production, several hundred Apache Hudi tables, 14+ months in production, 10,000+ tables in the dev pipeline.

Modernizing 150+ source systems across the aviation ecosystem ↓ https://t.co/u3wsonoGrQ

1

5

2

1

176

Apache Hudi

@apachehudi

2 days ago

Aviation is deeply regulated. Hudi ran in production here at enterprise scale. Links ↓ #ApacheHudi #AWS

1

0

19

Who to follow

DuckDB

@duckdb

DuckDB is an analytical SQL database management system. "DuckDB" and the DuckDB logo are registered trademarks of the DuckDB Foundation.

Vinoth Chandar

@byte_array

Founder @Onehousehq, Creator of @apachehudi, Built the World's first #DataLakehouse, Distributed/Data Systems, Linkedin, Uber, Confluent alum. (views are mine)

Apache Superset

@apachesuperset

Modern, open source data exploration & visualization platform 📊. Open Source business intelligence (BI) is here, and here to win!

Apache Hudi

@apachehudi

3 days ago

Compaction docs: https://t.co/lwftszd3Lt Async services overview: https://t.co/THG2II0EFx

0

1

0

26

Apache Hudi

@apachehudi

3 days ago

Compaction is the price you pay for write speed on MoR tables. Hudi is the only open table format that does it without blocking writes.

apachehudi's tweet photo. Compaction is the price you pay for write speed on MoR tables. Hudi is the only open table format that does it without blocking writes. https://t.co/mzHOOHdAHy

1

7

1

248

Apache Hudi

@apachehudi

3 days ago

Tunable: inline / after N commits / time-based / async. Selection by log size, file-group activity, or partition. A fully compacted MoR table reads identically to CoW. The question is just when to reconcile. Links ↓ #ApacheHudi #MergeOnRead

1

0

29

Apache Hudi

@apachehudi

3 days ago

AI overview docs: https://t.co/6KYOp4evlF Vector search: https://t.co/fwH3VvuoTr BLOB type: https://t.co/g67rl6saZw Lance file format: https://t.co/AfTT58MEhN

0

26

Apache Hudi

@apachehudi

3 days ago

AI workloads on the lakehouse have a different shape than analytics. Embeddings (768–3072 floats/row) replace counts and sums. Raw assets become first-class. Queries shift to "nearest neighbors filtered by tenant." Feature tables get thousands of columns wide.

apachehudi's tweet photo. AI workloads on the lakehouse have a different shape than analytics.

Embeddings (768–3072 floats/row) replace counts and sums. Raw assets become first-class. Queries shift to "nearest neighbors filtered by tenant." Feature tables get thousands of columns wide. https://t.co/bkkJCoKPAc

1

8

1

3

292

Apache Hudi

@apachehudi

3 days ago

AI-native isn't a separate stack. It's the same lakehouse with the primitives AI workloads actually need. No need to copy data into another specialized system. Links ↓ #ApacheHudi #AINativeLakehouse

1

0

85

Apache Hudi

@apachehudi

4 days ago

Full AWS reference architecture: https://t.co/xHznv3wrzy

0

1

0

22

Apache Hudi

@apachehudi

4 days ago

Configuration Manager — table settings in DynamoDB; one Glue job processes multiple tables (tested with 18).

1

0

86

Apache Hudi

@apachehudi

4 days ago

DynamoDB global tables for cross-region failover. Audit tracking for the regulatory crowd. Links ↓ #ApacheHudi #AWS

1

0

34

Apache Hudi

@apachehudi

4 days ago

File Manager — detects CDC files via EventBridge, queues through SQS, tracks metadata in DynamoDB. File Processor — Step Functions orchestrating parallel Glue jobs that produce Hudi datasets.

1

0

90

Apache Hudi

@apachehudi

4 days ago

AWS pattern for ingesting hundreds of operational databases into one Hudi lakehouse — without N pipelines for N tables. A three-component framework ↓

apachehudi's tweet photo. AWS pattern for ingesting hundreds of operational databases into one Hudi lakehouse — without N pipelines for N tables.

A three-component framework ↓ https://t.co/aJpDJMMPVo

1

7

1

2

167

Apache Hudi

@apachehudi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users