FactIQ

2 days ago

Today, we’re re-launching FactIQ. Six months ago, we launched a search and visualization engine for economic data. You could ask for a dataset, find the right series, and turn it into a chart. But over the last few months, the role of agents has changed. They are no longer just useful for saving a few minutes on repetitive work. They now act like a tireless second pair of eyes. Digging through data, testing competing explanations, and surfacing evidence you may not have looked out for. That opened up a much bigger opportunity for FactIQ. Macro analysts don’t just need another way to make charts. They need to answer the hardest and most important question in research: what’s actually happening? Answering that means looking beyond the obvious narrative. Beyond the single indicator everyone is watching. Beyond the chart that confirms what you already believed. The new FactIQ turns a macro question into an investigation. It breaks the question into the explanations that could be true, searches across official data, global institutions, government releases, news, and trusted industry sources, and tests which explanations are actually supported by the evidence. The goal is simple: give every macro analyst the capabilities of an institutional research desk. Use FactIQ to write macro notes, brief clients, support investment decisions, or pressure-test your view before publishing it. Try it today at https://t.co/o5PXgDbnq3. We would love to know what you think.

tryfactiq retweeted

4 months ago

https://t.co/nKk2ci9yCb

Transform how work gets done with custom AI agents, connected to your company knowledge and tools, powered by the best AI models. Just use Dust.

6 months ago

Excited to launch FactIQ today! 🚀 We just indexed 7.4M+ official US data series to build the ultimate economic research agent. Visualize trends instantly. Verify every source. Export charts for your reports. Free for the next week - try it out at factiq[dot]com!

21K

tryfactiq retweeted

Chandan @kchandanAI

about 1 year ago

I had the opportunity to present DB-Agent, an open-source AI agent designed for intelligent database interactions, at AAAI-25! It’s simple streamlit app with @defogdata SQL model. Works great across most platforms and game changer for non tech users who don’t write SQL #AAAI25

kchandanAI's tweet photo. I had the opportunity to present DB-Agent, an open-source AI agent designed for intelligent database interactions, at AAAI-25! It’s simple streamlit app with @defogdata SQL model. Works great across most platforms and game changer for non tech users who don’t write SQL #AAAI25 https://t.co/98xYb618eI

Who to follow

Dust

@DustHQ

Spark Capital

@sparkcapital

References available upon request.

anton

@abacaj

tryfactiq retweeted

Manas Sharma @manshar_

over 1 year ago

1.5 years of work, re work, learning, and more work, by this awesome team @defogdata is now open for all to use 🎉

tryfactiq retweeted

over 1 year ago

Open-sourcing Introspect: MIT-licensed Deep-Research for your internal data! Works with spreadsheets, databases, PDFs, and web search. Has a remarkably simple architecture – Sonnet agent armed with recursive tool calling and 3 default tools. Best for use-cases where you want to combine insights from SQL with unstructured data + data from the web. Launching a bunch of products built on top of of this soon, but wanted to open-source our core engine to give back to the community! 2 min demo below, links in comments

653

813

71K

tryfactiq retweeted

Caleb

@calebfahlgren

over 1 year ago

Benchmarking Open vs Closed SOTA models on Text-2-SQL via @defogdata SQL Eval using @openrouter 🐋 Deepseek V3 🟠 Sonnet 3.5 ⚫ gpt4o-latest 🦦 Qwen2.5-Coder 32B 🦙 Llama-3.3 70B

calebfahlgren's tweet photo. Benchmarking Open vs Closed SOTA models on Text-2-SQL via @defogdata SQL Eval using @openrouter

🐋 Deepseek V3
🟠 Sonnet 3.5
⚫ gpt4o-latest
🦦 Qwen2.5-Coder 32B
🦙 Llama-3.3 70B https://t.co/ar45lAo65p

tryfactiq retweeted

almost 2 years ago

We made a thing! Very happy to announce sqlcoder-pro and the Defog Alignment Platform. Available to use immediately without a wait-list, weights will be open-sourced very soon. The video does a quick show and tell comparison against ChatGPT (with gpt-4o). Read on for more details! TLDR 💪 equal (or better) performance on text-to-SQL as the most capable Claude-3.5 or GPT-4 models 🤝 You can use it today on a free plan/free trial, without a waitlist 🪽 self-hostable on a single RTX4090, with 2 second median generation times for SQL queries 🔁 exactly the same output every time, give the same prompt 👨🏻‍🏫 teachable and steerable: show the model what you want it to do 🛞 debuggable – you can understand WTF is going on inside the model, instead of treating it like a black box Let's dig into each of these one-by-one! Performance SQLCoder-8b-pro significantly exceeds the performance of our previous sqlcoder-8b model on Postgres text-to-SQL (from 88.2% to 90.2% accuracy - gpt-4o is at 87.6%, for reference). It is also better at following instructions. This was done via self-merges, hand crafted fine-tuning data, and adapting the training data to fit our tokenizer. Cost You can host this on the model on a single $3,500 RTX4090, and support ~5 requests/second via VLLM. If you're looking to host on the cloud instead, you can run it on a single L4 GPU that costs $300/mo on GCP Repeatability We have a dense 8b model with no MoE shenanigans. For the same prompt with temperature=0, you'll always get the same answer – which is critical in BI. Teachable In our alignment and feedback modes, you can give the model feedback on how it answered certain questions, and it will automatically adapt to the feedback. Debuggable You can use logprobs and attention scores to determine where, exactly is the model paying attention to inside a prompt + what it's getting confused by when generating outputs. Available today You can use Defog on the cloud today by going to docs[dot]defog[dot]ai, and getting an API key. Excited to hear what you think!

122

13K

tryfactiq retweeted

almost 2 years ago

Just finished running evals for Postgres text-to-SQL on the new Llama 3 models TLDR - Unfinetuned llama models not (yet) as good as OpenAI and Claude models, but will easily outperform with finetuning on domain specific tasks - Llama 3.1 8B is faaaar better than the Llama 3 8b Performance on SQL Eval 2 We have hand-crafted 314 questions (many of which are fairly difficult) for deterministic text to SQL evaluations. If the model generates a SQL query, we actually execute it against a database to check its performance. OpenAI and Anthropic's latest models are really good at text to SQL. Unfinetuned Llama3.1-405b-instruct models often got almost there, but fell short at the last mile. I tried a bunch of prompts (including CoT prompts), and results were +-1% of each other If you'd like, you can run the evals yourself at https://t.co/n0CxuKqREN. I ran them on an 8xH100 from runpod, and it probably took ~$20 or so to run all the evals! The promise of fine tuning Our 8b Llama-3 finetune performed on-par with or better than the OpenAI models for text to SQL (yes, we did make sure to prevent data contamination :P). We've been working on this finetune for 3 months and will release a version of it soon. For codegen, understanding Llama's tokenizer was critical for best performance. For the last 3 months, most of the my time was spent starting at attention scores + logprobs, using them to find gaps in training data viz-a-viz the Llama tokenizer, then updating training data and starting training runs again. The jump up from Llama3 to 3.1 We saw a big jump in capabilities from Llama-3-8B to 3.1, though didn't see as big of an increase for the 70B variant. I'll get a few more finetuning runs on the 3.1 variant up, and will see if 3.1 finetunes outperform 3 finetunes significantly.

rishdotblog's tweet photo. Just finished running evals for Postgres text-to-SQL on the new Llama 3 models

TLDR
- Unfinetuned llama models not (yet) as good as OpenAI and Claude models, but will easily outperform with finetuning on domain specific tasks
- Llama 3.1 8B is faaaar better than the Llama 3 8b

Performance on SQL Eval 2
We have hand-crafted 314 questions (many of which are fairly difficult) for deterministic text to SQL evaluations. If the model generates a SQL query, we actually execute it against a database to check its performance.

OpenAI and Anthropic's latest models are really good at text to SQL. Unfinetuned Llama3.1-405b-instruct models often got almost there, but fell short at the last mile. I tried a bunch of prompts (including CoT prompts), and results were +-1% of each other

If you'd like, you can run the evals yourself at https://t.co/n0CxuKqREN. I ran them on an 8xH100 from runpod, and it probably took ~$20 or so to run all the evals!

The promise of fine tuning
Our 8b Llama-3 finetune performed on-par with or better than the OpenAI models for text to SQL (yes, we did make sure to prevent data contamination :P). We've been working on this finetune for 3 months and will release a version of it soon.

For codegen, understanding Llama's tokenizer was critical for best performance. For the last 3 months, most of the my time was spent starting at attention scores + logprobs, using them to find gaps in training data viz-a-viz the Llama tokenizer, then updating training data and starting training runs again.

The jump up from Llama3 to 3.1
We saw a big jump in capabilities from Llama-3-8B to 3.1, though didn't see as big of an increase for the 70B variant.

I'll get a few more finetuning runs on the 3.1 variant up, and will see if 3.1 finetunes outperform 3 finetunes significantly.

118

10K

tryfactiq retweeted

Marktechpost AI

@Marktechpost

about 2 years ago

Defog AI Introduces LLama-3-based SQLCoder-8B: A State-of-the-Art AI Model for Generating SQL Queries from Natural Language Defog introduced LLama-3-based SQLCoder-8B, a state-of-the-art model for generating SQL queries from natural language. This new model stands out by addressing the limitations of prior systems. Traditional models often buckle under the pressure of complex, instruction-heavy queries or fail to adapt to the nuances presented by different database frameworks. SQLCoder-8B revolutionizes this landscape by integrating a broader spectrum of training data encompassing various instructions and more challenging SQL generation tasks. SQLCoder-8B distinguishes itself through a refined methodology that significantly enhances its capability to process and follow intricate instructions, leading to highly accurate SQL outputs. The model has been rigorously trained on a dataset enriched with diverse SQL query scenarios. This training is designed to equip the model with the versatility to tackle real-world applications, ranging from simple direct queries to complex, multi-step SQL instructions. Quick read: https://t.co/v0ILuQR2W1 Model: https://t.co/ZFkd43vY1I Demo: https://t.co/JOtSxUiWFL @defogdata #ai #DataScientist #sql

Marktechpost's tweet photo. Defog AI Introduces LLama-3-based SQLCoder-8B: A State-of-the-Art AI Model for Generating SQL Queries from Natural Language

Defog introduced LLama-3-based SQLCoder-8B, a state-of-the-art model for generating SQL queries from natural language. This new model stands out by addressing the limitations of prior systems. Traditional models often buckle under the pressure of complex, instruction-heavy queries or fail to adapt to the nuances presented by different database frameworks. SQLCoder-8B revolutionizes this landscape by integrating a broader spectrum of training data encompassing various instructions and more challenging SQL generation tasks.

SQLCoder-8B distinguishes itself through a refined methodology that significantly enhances its capability to process and follow intricate instructions, leading to highly accurate SQL outputs. The model has been rigorously trained on a dataset enriched with diverse SQL query scenarios. This training is designed to equip the model with the versatility to tackle real-world applications, ranging from simple direct queries to complex, multi-step SQL instructions.

Quick read: https://t.co/v0ILuQR2W1

Model: https://t.co/ZFkd43vY1I

Demo: https://t.co/JOtSxUiWFL

@defogdata #ai #DataScientist #sql

tryfactiq retweeted

about 2 years ago

Llama-3 based SQLCoder 8b is out! Open weights with a commercially friendly cc-by-sa license. Probably the best <10B param model for Postgres text to SQL right now. Slightly better than gpt-4-turbo and claude opus for 0-shot text to SQL generation. Also approaches their performance when following instructions. Weights on @huggingface: https://t.co/lg0A2f4tqc Demo (optimized for postgres): https://t.co/qp4zvZ52xV More technical details below! What's new about this model Our previous small model (sqlcoder-7b-2) was good at generating 0-shot SQL, but did terribly at following instructions. So while it was great in our evals, it was lacking in real-world use-cases where instruction following is much more important. To address this, we trained this model with much more instruction data. We also made our original eval much harder to make sure we stayed on the right track. Changes to evals There were 3 changes to our original eval: 1. Previously, we pruned the database schema to only consider the 20 relevant columns in the DDL statements. We have now removed pruning that so that all columns in a database are used 2. We previously used beam search with 4 beams to make our results more accurate. But with a large number of input and/or output tokens, that increased memory requirements and became computationally intractable. So we have shifted to a single beam now. 3. We added 104 complex instruction-following text=> SQL questions questions to our evals, in addition to the 200 0-shot questions that were already there. Link to our eval framework here: https://t.co/n0CxuKqjPf Changes to prompt You previously had to use our slightly idiosyncratic prompt for best results. Now, you can just use the standard Llama-3 instruct prompt. 70B model, technical report, and more up next We've also been training a llama-3 based 70B model right now. It's still training and will get better over time – but even an AWQ quantized version of our interim model is giving excellent results for now. We hope to open-source the 70B next week. We also have a technical report coming up next week (or over the weekend, if I can be productive enough on a flight) about the training methods used for this model. More on that soon! Feedback very much appreciated! In the meantime, please send us your feedback as you try out the model - specially if you see failure modes. Would very much appreciate it!

rishdotblog's tweet photo. Llama-3 based SQLCoder 8b is out! Open weights with a commercially friendly cc-by-sa license. Probably the best <10B param model for Postgres text to SQL right now.

Slightly better than gpt-4-turbo and claude opus for 0-shot text to SQL generation. Also approaches their performance when following instructions.

Weights on @huggingface: https://t.co/lg0A2f4tqc

Demo (optimized for postgres): https://t.co/qp4zvZ52xV

More technical details below!

What's new about this model
Our previous small model (sqlcoder-7b-2) was good at generating 0-shot SQL, but did terribly at following instructions. So while it was great in our evals, it was lacking in real-world use-cases where instruction following is much more important.

To address this, we trained this model with much more instruction data. We also made our original eval much harder to make sure we stayed on the right track.

Changes to evals
There were 3 changes to our original eval:

1. Previously, we pruned the database schema to only consider the 20 relevant columns in the DDL statements. We have now removed pruning that so that all columns in a database are used

2. We previously used beam search with 4 beams to make our results more accurate. But with a large number of input and/or output tokens, that increased memory requirements and became computationally intractable. So we have shifted to a single beam now.

3. We added 104 complex instruction-following text=> SQL questions questions to our evals, in addition to the 200 0-shot questions that were already there.

Link to our eval framework here: https://t.co/n0CxuKqjPf

Changes to prompt
You previously had to use our slightly idiosyncratic prompt for best results. Now, you can just use the standard Llama-3 instruct prompt.

70B model, technical report, and more up next
We've also been training a llama-3 based 70B model right now. It's still training and will get better over time – but even an AWQ quantized version of our interim model is giving excellent results for now. We hope to open-source the 70B next week.

We also have a technical report coming up next week (or over the weekend, if I can be productive enough on a flight) about the training methods used for this model. More on that soon!

Feedback very much appreciated!
In the meantime, please send us your feedback as you try out the model - specially if you see failure modes. Would very much appreciate it!

572

115

497

97K

tryfactiq retweeted

about 2 years ago

Cloudflare's new AI announcements look fun! Check out sqlcoder-7b-2 on their playground :D https://t.co/HYkXIcls8r Unfortunately allows only for chat-styled inference right now (which we are not optimized for) – but still outperforms other models for text to SQL tasks!

rishdotblog's tweet photo. Cloudflare's new AI announcements look fun! Check out sqlcoder-7b-2 on their playground :D https://t.co/HYkXIcls8r

Unfortunately allows only for chat-styled inference right now (which we are not optimized for) – but still outperforms other models for text to SQL tasks! https://t.co/LIKNpEXf76

tryfactiq retweeted

anmol maini

@anmolm_

about 2 years ago

👀 @defogdata

tryfactiq retweeted

over 2 years ago

Pretty cool to see this on HF trending today :D Also, building some fun MLX integrations, thanks to @Ubunta's awesome MLX port. Already a part of sql-eval in this PR: https://t.co/axhUzur7gM

rishdotblog's tweet photo. Pretty cool to see this on HF trending today :D

Also, building some fun MLX integrations, thanks to @Ubunta's awesome MLX port. Already a part of sql-eval in this PR: https://t.co/axhUzur7gM https://t.co/CNxn8x7K4c

over 2 years ago

@david_sabata @AIatMeta Nope – only applies if you modify the model weights!

393

over 2 years ago

And here it is! SQLCoder-70B outperforms all other models in the market for Postgres text-to-SQL generation by a wide margin. We got these results by fine-tuning @AIatMeta CodeLlama-70B, which was released yesterday, on less than 20,000 hand-curated prompt-completion pairs. You can find SQLCoder-70B on @huggingface at https://t.co/fPmIrYaLAv This follows our 15B, 7B, and 34B models – and is the most capable of them all. The model has a CC BY-SA 4.0 license, which means that you are free to use it as is for any use (including commercial) as long as you also open-source any changes you make to it (eg, if you fine-tune it further).

tryfactiq's tweet photo. And here it is! SQLCoder-70B outperforms all other models in the market for Postgres text-to-SQL generation by a wide margin.

We got these results by fine-tuning @AIatMeta CodeLlama-70B, which was released yesterday, on less than 20,000 hand-curated prompt-completion pairs.

You can find SQLCoder-70B on @huggingface
at https://t.co/fPmIrYaLAv This follows our 15B, 7B, and 34B models – and is the most capable of them all.

The model has a CC BY-SA 4.0 license, which means that you are free to use it as is for any use (including commercial) as long as you also open-source any changes you make to it (eg, if you fine-tune it further).

10K

tryfactiq retweeted

Mati @KMatiDev1

over 2 years ago

Just reproduce SQLCoder-7b-2 (fp16) from @defog with a beam_size of 1. It achieved around 87%, as mentioned by @rishdotblog earlier. So, how about AWQ and int8 KVcache? Let's guess. 🧐

KMatiDev1's tweet photo. Just reproduce SQLCoder-7b-2 (fp16) from @defog with a beam_size of 1. It achieved around 87%, as mentioned by @rishdotblog earlier.

So, how about AWQ and int8 KVcache?
Let's guess. 🧐 https://t.co/ruxIyvMOMr

tryfactiq retweeted

over 2 years ago

You can now run SQLCoder with a GUI on Apple Silicon or any NVIDIA GPU-enabled device! On Apple Silicon, just run CMAKE_ARGS="-DLLAMA_METAL=on" pip install "sqlcoder[llama-cpp]" sqlcoder launch The Apple Silicon version is not super accurate, but works great for simple questions – and everything happens (fairly fast) on just your laptop! We will add support for more platforms soon. PRs are very welcome! Github: https://t.co/zgMT0C1IeS

10K

tryfactiq retweeted

over 2 years ago

Two big updates today! We updated the weights for sqlcoder-7b-2, and it now outperforms GPT-4 for most SQL queries – specially if you give it the right instructions and prompt well @huggingface link here: https://t.co/8TeOBdu5Rf 2) We've added basic instruction following capabilities, including the ability to say "I don't know" when a question cannot be confidently answered given a database schema. More on that here! https://t.co/v3w8lwqUsr 3) We've got a Colab notebook up if you want to explore the model (though just with a single beam, for now): https://t.co/KkHuhR4Pir

rishdotblog's tweet photo. Two big updates today!

We updated the weights for sqlcoder-7b-2, and it now outperforms GPT-4 for most SQL queries – specially if you give it the right instructions and prompt well

@huggingface link here: https://t.co/8TeOBdu5Rf

2) We've added basic instruction following capabilities, including the ability to say "I don't know" when a question cannot be confidently answered given a database schema. More on that here! https://t.co/v3w8lwqUsr

3) We've got a Colab notebook up if you want to explore the model (though just with a single beam, for now): https://t.co/KkHuhR4Pir

236

157

31K