Relational Foundation Models face a scaling problem: diverse training datasets are rarely public due to privacy constraints 🔒.
🚀 We are excited to introduce "PluRel": a framework that synthesizes diverse multi-table relational databases from scratch, unlocking scaling laws for RFMs. 🧵
Kudos to the amazing collaborators at @StanfordAILab@Kumo_ai_team , and @SAP : @_rishabhranjan_@VHudovernik@vijaypradwi@johanneshoffart@guestrin@jure
Synthetic data is critical for foundation models, even more so in relational and tabular domains where public data is scarce. Our new work shows how synthetic pretraining unlocks a whole new axis to scale up relational foundation models (RFMs)!
This was a super fun collaboration with @kvignesh1420, @VHudovernik, @vijaypradwi, @johanneshoffart, @guestrin and @jure.
Paper: https://t.co/4jzttapESf
Code, data, models: https://t.co/gd58JA9imo
Quite exciting work on synthetic data generation that for the first time demonstrates scaling laws for graph/relational foundation models.
Great work by @kvignesh1420@_rishabhranjan_@VHudovernik and our collaborators at @Kumo_ai_team and @SAP
Although relational databases are everywhere, there is no equivalent of the public internet for pretraining Relational Foundation Models (RFMs). Excited to see RelBench bridging that gap, growing from 7 datasets in v1 to 88+ datasets in v2.
Deeply grateful to the numerous community contributions for helping RelBench serve as the central data repository for RFM research. ❤️
🚀 Announcing RelBench V2, a major update to our benchmark for foundation models on relational data!
With V2, we are significantly expanding the benchmark’s scope to catalyze further research in Relational Deep Learning (RDL) and Relational Foundation Models (RFMs).
Key features:
🍺 4 new databases, spanning domains like e-commerce and beer reviews to scientific research and clinical healthcare.
🧩 40 new predictive tasks, including 28 autocomplete tasks, across new and existing databases.
🔌 External data integrations: 70+ datasets from CTU, 7 datasets from 4DBInfer, and your own data via SQL connector, all in RelBench format.
🛠️ Bug fixes and performance improvements.
🔥 Introducing autocomplete tasks: As opposed to forecasting tasks, autocomplete tasks predict existing columns in the database. We found that models need to deeply understand the relational context to autocomplete database fields, a critical capability that expands the scope of real-world RDL applications.
Learn more:
🌐 Website: https://t.co/G4OBtj0R92
💻 GitHub: https://t.co/99FBJK5kji
Huge thanks to @justingu32@_rishabhranjan_@jakub_peleska@VHudovernik@CKanatsoulis@fengyuli607, Tang Haiming, Alistiq and everyone else who contributed to our GitHub for making this possible!
💠 Stanford Graph Learning Workshop 2024! Join leaders from academia and industry to explore the latest in Machine Learning and AI. Topics include Relational domains, Foundation Models, Agents and more.
Save the date: Tuesday, Nov 5, 2024, 09:00 - 18:00 PT. The event will be held at Stanford University and live-streamed online.
Register and/or submit a talk/poster: https://t.co/QpVB0AIWyI
🚀 Announcing RelBench: an open benchmark for deep learning on relational databases! RelBench is the foundational infrastructure for research in Relational Deep Learning (RDL), which brings modern AI to structured data.
RelBench has databases, tasks, loaders, evaluators, and leaderboards to catalyze research in the field!
Key features:
🌍 7 datasets spanning diverse domains: e-commerce, social, medical, and sports.
🧩 30 carefully curated predictive tasks: including entity classification/regression and recommendation.
📊 Wide data size range: ranging from 74K to 41M rows, 15 to 140 columns, 3 to 15 tables.
⏳ Wide time spans: from 2 weeks to 55 years of training data.
🏅 Comprehensive benchmarks: SOTA tabular learning and GNN baselines for every task.
🔥We hired a data scientist with 5 years of industry experience to solve RelBench tasks using traditional machine learning (feature engineering, model training). The RDL outperforms the data scientist in accuracy while reducing the time/code by 20x (12.3 hors -> 0.5 hours) !!! 🤯
Learn more:
🌐 Website: https://t.co/BzwWxv9lNb
📄 Paper: https://t.co/bR3yxYVPyc
💻GitHub: https://t.co/EXdsMNTkEW
Follow @RelBench for the latest updates
Shoutout to the amazing team: @Josh_d_robinson@_rishabhranjan_@weihua916@KexinHuang5@jiaqihan99@adobles96@rusty1s@janericlenssen@yiwenyuan98@zechengzh@xhe1997@Kumo_ai_team@PyG_Team@StanfordAILab