I genuinely think we built the best search engine for official economic data.
Been working on this for 6 months. We spent ~$100k in tokens to structure economic data and make it easier to search.
It's answers economic data really well. From "What has been the actual impact of AI on software engineering jobs in the last 2 years?" to "Why did egg prices increase so much more than chicken prices in the last 5 years?"
Would love feedback (the more blunt the better). We have a generous free tier for the next week!
Excited to launch FactIQ today! 🚀
We just indexed 7.4M+ official US data series to build the ultimate economic research agent.
Visualize trends instantly. Verify every source. Export charts for your reports.
Free for the next week - try it out at factiq[dot]com!
Today, we’re re-launching FactIQ.
Six months ago, we launched a search and visualization engine for economic data. You could ask for a dataset, find the right series, and turn it into a chart.
But over the last few months, the role of agents has changed.
They are no longer just useful for saving a few minutes on repetitive work. They now act like a tireless second pair of eyes. Digging through data, testing competing explanations, and surfacing evidence you may not have looked out for.
That opened up a much bigger opportunity for FactIQ.
Macro analysts don’t just need another way to make charts. They need to answer the hardest and most important question in research: what’s actually happening?
Answering that means looking beyond the obvious narrative. Beyond the single indicator everyone is watching. Beyond the chart that confirms what you already believed.
The new FactIQ turns a macro question into an investigation.
It breaks the question into the explanations that could be true, searches across official data, global institutions, government releases, news, and trusted industry sources, and tests which explanations are actually supported by the evidence.
The goal is simple: give every macro analyst the capabilities of an institutional research desk.
Use FactIQ to write macro notes, brief clients, support investment decisions, or pressure-test your view before publishing it.
Try it today at https://t.co/o5PXgDbnq3. We would love to know what you think.
Here's a report re AI and power in the US that it just produced with the @tryfactiq harness: https://t.co/uItwbAsBlx
Super sharp, and much much much better than what we were getting with 4.7
I like Opus 4.8 so far!
- way more token efficient than 4.7
- clearly better at financial analysis, dataviz, and writing
- far less hedging and handwavy explainations
- works extremely well in non anthropic harnesses, too
Fantastic for finance/econ agents
@Forbu14 their model is great! the valuation is still absurd.
deepseek (which invented most of the things used in GLM - DSA, MLA, MTP, GRPO) is raising at *half* zhipu's current valuation with 6x their traffic on Openrouter
Jfc Asian public markets are retarded when it comes to AI
Zhipu (GLM maker) is trading at a US$92B valuation, with ttm sales of ~US$100M (920x multiple)
Wild to see a public co trading on Series A metrics
The bubble is already here. It's just not evenly distributed.
Sadness. Gemini 3.5 Flash is as haunted as 3 Flash
I _really_ wanted it to work. But it's totally broken in non-Google harnesses
Way slower (and worse) than GPT-5.5/Opus at tool-chaining - despite the high output tok/s
WTF did Google to do Gemini 3 Flash during post-training. It's a tortured model
If given tasks it can't do (because of insufficient tools), it just... keeps trying.
Making failed tool calls a 100 times. Even if explicitly told it that it's okay to give up in the system prompt.
didn't expect this, but codex with gpt-5.5 medium has become my daily driver
only situations where i use something else are:
- complex backend work (use gpt-5.5 xhigh for this)
- initial ideation with vague prompts (claude code w opus 4.6)
- UI work (opus 4.7)
codex xhigh just unslopified a gnarly file that was a 3000 line mess. had tried everything (opus 4.6, 4.7, 5.4, 5.3-codex) to refactor this. none of those had worked without causing new regressions or race conditions
5.5 xhigh one shotted it
Early GPT-5.5 impressions - finally an OAI model that matches Claude at tool calls
Until yesterday, Opus/Sonnet were the only reliable options if you wanted to build a fast, long-running agent. GPT-5.4 was good, but thought too much, and was super slow. It also polluted the context with too many thinking tokens and so had degraded performance at long-running tasks.
Gemini models are... just weird at tool calling - they often get stuck in infinite loops.
GPT-5.5 costs slightly less than Opus across the tool calling loop, is just as fast, and just as good. I like its personality more - much less hedging (specially for things like financial analysis) and more to the point.
It's also much more broadly useful. Codex with GPT-5.4 was pretty good at code, but Opus was just better for general tasks. GPT-5.5 feels super competent across the board.
Really excited for this release. Makes the LLMs for AI-agent market competitive again!
for frontend agents, don't go from specs to code directly
instead, to specs -> gpt-image-2 -> frontend code
beats any coding agent out there. phenomenal tip from @reach_vb!
@shannholmberg@reach_vb codex has the `$imagegen` skill which can be used directly
if in your own harness, use the openai api for generating an image from the spec first - then pass that image along with the spec to gpt or claude to write frontend code
it's been _really_ good on the whole! main gaps with Opus:
- frontend aesthetics, specially data-viz
- synthesizing financial/economic data []
where I feel it's *better* than opus:
- makes fewer tool calls and gets them right. really helps save on costs and latency in agentic loops
- very very good at self correcting and escaping doom loops
[1] take a quick look at this report with GPT-5.5 as the driver model with GPT-5.5 subagents https://t.co/bY3POUOlQ5
You can look at the actual tool calls it made by clicking on the "Done in 11m" stepper + clicking through to see the trajectory. _Extremely_ solid. Couldn't be happier.
But the final result (synthesizing the results of subagents) that you read in the report was off. That bit will obv improve in the next versions - but is only big gap with opus that I see in my testing so far!
@reach_vb > use imagegen paired to generate what you want and then ask codex to build
brilliant. asking codex to implement this rn! will report back how it goes
The last time I dealt with model regressions this bad was in the GPT-4 era. Sonnet is borderline unusable today.
Anthropic will lose so much market share if they don't secure more compute, and fast.