๐ThalamusDB counting pictures of red cars in the database:
- Semantic operators are described in natural language and evaluated via GPT-5
- Simply store paths to images or audio files in your database โ ThalamusDB recognizes the file format and selects the right LLM
๐พ Code: https://t.co/6l1gS8zZFA
๐ Website: https://t.co/33PAzsmI7T
#SemanticQueries #ApproximateProcessing #LLM #GPT5 #ThalamusDB
๐ก Two arXiv papers published in recent days (one from us, one from TUD) reach the same conclusion: LLMs can now generate C++ code for SQL processing that outperforms classical database systems.
โ๏ธ Our code generator is based on Claude Code and exploits multiple agents working in parallel. Each agent performs tasks typically associated with different components in a #DBMS, such as workload analysis, query optimization, or physical design tuning.
๐ We compare to various classical #DBMS such as DuckDB, ClickHouse, Umbra, MonetDB, and PostgreSQL, finding that the agent-generated code is often significantly faster. Code generation costs are moderate (<$20), making the approach practical for frequently executed queries.
๐ค Analyzing generated code, we find that agents exploit various optimization techniques, including query-specific data structures, as well as low-level optimizations that are specific to the hardware cache hierarchy of our server.
๐ Paper: https://t.co/PBLC9XX5Tv
๐พ Code: https://t.co/MvdH8lHT6F
๐ Site: https://t.co/2WM2gedwbi
@lojil192574 #LLM #Databases #AI #DB
A demo of #ThalamusDB (#SIGMOD2023), introducing semantic filter operators. Users write SQL queries with natural language predicates on table columns containing ๐ผ๏ธ images, ๐ text, or ๐ sound files. These predicates are evaluated via #LLMs.
In the video (below), I'm querying for furniture ads with pictures showing "wooden tables". After entering my query, #ThalamusDB
1๏ธโฃ performs data profiling and cost-based optimization,
2๏ธโฃ shows the Pareto frontier of cost-quality tradeoffs,
3๏ธโฃ updates bounds on query aggregates while processing.
#ThalamusDB is designed from the ground up for approximate processing, prioritizing data that maximally reduces approximation error per cost unit.
๐ชง #SIGMOD2023 demo: https://t.co/hELlBtKRZb
๐ #SIGMOD2024 paper: https://t.co/jzfd4IUbGt
๐พ Code repository: https://t.co/zhRn5xX4Km
@SaehanJo@sigmod #GPT4 #LanguageModel #MultimodalData @Cornell@CornellCIS
๐ฅณPaper accepted at #SIGMOD2026! Our paper leverages #DigitalAnnealers (hardware accelerators for optimization) for #QueryOptimization. We scale up to large problem instances using 1โฃdomain-specific problem decomposition and 2โฃpre/post-processing on classical machines. @sigmod
๐ฅณMany congrats to Dr. Saehan Jo!
๐Saehan successfully defended his PhD thesis "Efficient Data Systems for Scalable Analysis with LLMs", introducing systems like #ThalamusDB and #SpareLLM that scale up processing with #LLMs to very large data sets!
@SIGMODConf#Data#SQL#ML