Sup AI @SupAIHQ - Twitter Profile

Pinned Tweet

6 months ago

New SOTA on Humanity's Last Exam (HLE) We have achieved 52.15% accuracy on the world's hardest open-source AI reasoning test, setting a new benchmark record. Sup AI is now outperforming every individual frontier model, including Gemini 3 Pro Preview and GPT-5 Pro. Our lead over the next best model? +7.49 points. Check the full evaluation & code: https://t.co/l8FuQDRfxI #AI #MachineLearning #HLE #SupAI

supaihq's tweet photo. New SOTA on Humanity's Last Exam (HLE)

We have achieved 52.15% accuracy on the world's hardest open-source AI reasoning test, setting a new benchmark record.

Sup AI is now outperforming every individual frontier model, including Gemini 3 Pro Preview and GPT-5 Pro.

Our lead over the next best model? +7.49 points.

Check the full evaluation & code:
https://t.co/l8FuQDRfxI

#AI #MachineLearning #HLE #SupAI

3

8

4

2

1K

Sup AI

@supaihq

2 months ago

Try Sup AI: https://t.co/34mExCpkHw Support us on Product Hunt: https://t.co/JD5hq8poLG

1

2

0

21

Sup AI

@supaihq

2 months ago

Sup AI is live on @ProductHunt 🚀 "Which AI model is the best?" Wrong question. The best model isn't a model. It's an orchestra. Sup AI runs 9 frontier models in parallel and synthesizes their answers→ 52.15% on HLE benchmark (without the help of tools). → Multi-model consensus (up to 9 models) → Ensemble RAG with live web + your files → Every claim cited $10 free credit to start 20% off with code: PRODUCTHUNT Links below 👇

supaihq's tweet photo. Sup AI is live on @ProductHunt 🚀

"Which AI model is the best?"

Wrong question.

The best model isn't a model. It's an orchestra.

Sup AI runs 9 frontier models in parallel and synthesizes their answers→ 52.15% on HLE benchmark (without the help of tools).

→ Multi-model consensus (up to 9 models)
→ Ensemble RAG with live web + your files
→ Every claim cited

$10 free credit to start
20% off with code: PRODUCTHUNT

Links below 👇

3

1

0

160

Sup AI

@supaihq

2 months ago

We just launched Sup AI on @ProductHunt! We combine multiple AI models and use confidence scoring to give better answers with fewer hallucinations. #1 on Humanity's Last Exam: 52.15%. Beating every individual model. $10 starter credit to try it, and 20% off your first month with code "PRODUCTHUNT" https://t.co/lskoGKfWBE

1

4

0

44

Sup AI

@supaihq

3 months ago

@VictorTaelin You don't have to pick: https://t.co/8ysXKdNmFQ

0

1

0

7

Sup AI

@supaihq

4 months ago

Love seeing @Perplexity ship Model Council. Multi-model is the right direction. At Sup AI, we've pushed this further: 9-model ensembles + segment-level confidence scoring (logprob signals across every claim). Text can lie. A model can sound 100% confident while hallucinating. The math doesn't lie. Result: 52.15% HLE (SOTA) + 3 questions solved where ALL 9 individual models failed. The future isn't "which model is best." It's "what does each model know vs. what is it guessing?"

Perplexity

@perplexity_ai

4 months ago

Introducing Model Council in Perplexity. Run three frontier models at once, compare outputs, and get a more accurate, higher‑confidence answer. Available now on web only for Perplexity Max subscribers.

181

2K

219

603

411K

0

1

0

141

Sup AI

@supaihq

4 months ago

Run this through 9 models in parallel and you get 45-path reasoning automatically. Diversity beats perfection. Every time.

0

37

Sup AI

@supaihq

4 months ago

This is exactly right. And it compounds with model diversity. At Sup AI: 5 prompt variations × 9 frontier models = 45 reasoning paths cross-validated before synthesis. Single prompt on single model = leaving 90% of accuracy gains on the table. My friend Gary Gurevich built a "hyperplane metaprompt" that automates the prompt side: generates 5 non-overlapping angles, predicts objections, synthesizes with traceability. Full template 👇

God of Prompt

@godofprompt

4 months ago

Stanford researchers just published a prompting technique that makes today’s LLMs behave like better versions of themselves. It’s called “prompt ensembling” and it runs 5 variations of the same prompt, then merges the outputs. Here’s how it works 👇

godofprompt's tweet photo. Stanford researchers just published a prompting technique that makes today’s LLMs behave like better versions of themselves.

It’s called “prompt ensembling” and it runs 5 variations of the same prompt, then merges the outputs.

Here’s how it works 👇 https://t.co/MOfArDE9P1

38

783

108

2K

83K

1

0

110

Sup AI

@supaihq

4 months ago

Gary's Hyperplane Method: "Generate a metaprompt to restate any prompt 4 ways (sharpening, scope-widening, cross-domain). Each restatement's center of mass overlays the original but extends in NON-OVERLAPPING directions. Answer all 5. Predict my objections. Answer those. Synthesize with full traceability." [your prompt]

1

0

44

Sup AI

@supaihq

4 months ago

Unpopular opinion: The AI model race is a distraction. See this tug-of-war? 👇 9 AI models vs. 1 "best" model. The crowd wins. Every time. No single LLM excels at everything: Claude crushes analysis, GPT-5 dominates creative, Gemini nails structured data. Orchestration intelligently routes each task to the RIGHT specialist. Sup AI proved it: 52.15% on Humanity's Last Exam, beating Gemini 3 Pro by 7.5 points. The companies winning in 2026 won't have the "best" model. They'll be the ones who stopped picking sides. Does orchestration become a first-class category this year? 👇 #AI #AIOrchestration #MultiModel

supaihq's tweet photo. Unpopular opinion: The AI model race is a distraction.

See this tug-of-war? 👇

9 AI models vs. 1 "best" model.

The crowd wins. Every time.

No single LLM excels at everything: Claude crushes analysis, GPT-5 dominates creative, Gemini nails structured data.

Orchestration intelligently routes each task to the RIGHT specialist.

Sup AI proved it: 52.15% on Humanity's Last Exam, beating Gemini 3 Pro by 7.5 points.

The companies winning in 2026 won't have the "best" model.

They'll be the ones who stopped picking sides.

Does orchestration become a first-class category this year? 👇

#AI #AIOrchestration #MultiModel

0

2

0

69

Sup AI

@supaihq

5 months ago

Microsoft CEO Satya Nadella just confirmed the Sup AI thesis: "Assigning roles to models and orchestrating them gets better results than any single frontier model." We’ve built the engine to prove it. • 52.15% accuracy • +7.4 percentage points vs. single models • Available today Stop waiting for the next GPT. Start orchestrating. 🎯

0

2

0

1

74

Sup AI

@supaihq

5 months ago

AI agents don't fail like chatbots… AI agents fail like software in production. One bad action breaks trust. @usevemly AI employees close tickets and update CRMs in live systems. Early on: too confident, too many errors. Fix: Sup AI as decision layer → Multiple models propose actions → Only executes on high consensus + confidence → Otherwise: blocked or escalated Results: • 93% fewer incorrect tool calls * 41% faster resolution * 100% enterprise approval Full case study: https://t.co/zzw18DUi2y Autonomy you can actually trust. #AgenticAI #EnterpriseAI

supaihq's tweet photo. AI agents don't fail like chatbots…
AI agents fail like software in production.

One bad action breaks trust.

@usevemly AI employees close tickets and update CRMs in live systems. Early on: too confident, too many errors.

Fix: Sup AI as decision layer → Multiple models propose actions → Only executes on high consensus + confidence → Otherwise: blocked or escalated

Results:
• 93% fewer incorrect tool calls
* 41% faster resolution
* 100% enterprise approval

Full case study: https://t.co/zzw18DUi2y

Autonomy you can actually trust.
#AgenticAI #EnterpriseAI

0

1

0

32

Sup AI

@supaihq

5 months ago

☑️ Pro Mode → Expert Mode ☑️ Orchestrator now auto-picks thinking effort per model = massive cost savings + fixes slow GPT-5.2 Pro ☑️ Advanced model selector with per-model controls ☑️ Timestamps + generation times on all messages

supaihq's tweet photo. ☑️ Pro Mode → Expert Mode
☑️ Orchestrator now auto-picks thinking effort per model = massive cost savings + fixes slow GPT-5.2 Pro
☑️ Advanced model selector with per-model controls
☑️ Timestamps + generation times on all messages https://t.co/sRskPFUfdY

0

52

Sup AI

@supaihq

5 months ago

Sup AI memory just leveled up We upgraded from Voyage Multimodal 3 → 3.5 with @VoyageAI * Best-in-class multimodal RAG * More accurate chat memories * Hyper-personalized answers * Everything becomes permanent knowledge️ #SupAI #VoyageAI #Multimodal #RAG

supaihq's tweet photo. Sup AI memory just leveled up

We upgraded from Voyage Multimodal 3 → 3.5 with @VoyageAI

* Best-in-class multimodal RAG
* More accurate chat memories
* Hyper-personalized answers
* Everything becomes permanent knowledge️
#SupAI #VoyageAI #Multimodal #RAG https://t.co/pJI1MN4SAC

0

1

0

44

Sup AI

@supaihq

5 months ago

Sup AI Chrome Extension is live Your address bar → direct access to frontier models with forced citations. → Default search goes to Sup AI → !g for instant Google fallback → mode=fast / thinking / deep-thinking / pro → models=gemini-3-flash or models=qwen3-max,gemini-3-flash → Zero permissions. Zero data collection. https://t.co/lQ24BvXmpT

supaihq's tweet photo. Sup AI Chrome Extension is live

Your address bar → direct access to frontier models with forced citations.
→ Default search goes to Sup AI → !g for instant Google fallback → mode=fast / thinking / deep-thinking / pro → models=gemini-3-flash or models=qwen3-max,gemini-3-flash → Zero permissions. Zero data collection.
https://t.co/lQ24BvXmpT

0

3

0

90

Sup AI

@supaihq

5 months ago

3/ At Sup AI, we've seen this pattern work. Our multi-model orchestration scored 52.15% on Humanity's Last Exam: +7.49 points above any single frontier model. The future isn't bigger models. It's smarter systems.

0

3

0

50

Sup AI

@supaihq

5 months ago

1/ AI just solved an Erdős problem confirmed by @terencetao GPT-5.2 cracked Problem #728, a conjecture unsolved for decades. But the breakthrough isn't "one smart model." It's the architecture.

supaihq's tweet photo. 1/ AI just solved an Erdős problem confirmed by @terencetao

GPT-5.2 cracked Problem #728, a conjecture unsolved for decades.

But the breakthrough isn't "one smart model." It's the architecture. https://t.co/F4rBkhJIHr

1

3

1

0

456

Sup AI

@supaihq

5 months ago

2/ The solution required ORCHESTRATION: • GPT-5.2 generated the proof (intuition) @sama • Harmonic's Aristotle verified it in Lean (rigor) @vladtenev • Human feedback refined the approach @terencetao This is constructive synthesis in action.

1

3

0

58

Sup AI

@supaihq

5 months ago

Sup AI whitepaper is live on the methodology behind 52.15% on HLE: • 3 correct answers synthesized when EVERY model failed • Grok 4 (29%) uniquely solved 16 Qs vs GPT-5 Pro's 9 (40%) • Low correlation pairs >high accuracy pairs • 58.44% theoretical ceiling w/ models • 42% Qs unsolved by ANY model • Full methodology, IQ curves, correlation matrices: https://t.co/EiKtyUOGzo #AI #MachineLearning #OpenSource #AIResearch #EnsembleAI #AIOrchestration #HLE

0

3

2

1

468

Sup AI

@supaihq

Last Seen Users on Sotwe

Trends for you

Most Popular Users