Super excited to share our paper "Akal Badi ya Bias: An Exploratory Study of Gender bias in Hindi" has been accepted to #facct2024. Preprint available soon.
Really proud of this work. It is a huge group effort in partnership with @karya_inc [1/n]
@perplexity_ai just brought a major contribution to our repo, launched 4 weeks back!
@JamesLiounis_ shipped first-class Sonar + Agent API support to our open-source repo.
Route Perplexity Sonar through Future AGI gateway with caching, fallback chains, and guardrails. Use it in experiments, prompts, evals too. Real-time web access and citations baked in.
(https://t.co/MDEkaQx6dI)
Today is the biggest day for us at @FutureAGI_ as we go fully open source.
Hundreds of teams have trusted us to build self-improving AI agents. And as of today, it's on GitHub for every AI team on earth to use, extend, and build on.
Long story short, I have been in AI infra long enough to know that everyone talks about agents that learn and improve. Nobody ships the platform that makes it happen.
We did. And this is just the first step towards building truly autonomous AI - infrastructure where your agent gets better every deployment, without ever changing the LLM.
Here's why this had to exist. The teams doing the hardest work in AI are burning 6-figure engineering cycles debugging the same hallucinations on loop. That's not a team failure - it's a tooling gap. Non-deterministic systems can't be engineered with tools designed for deterministic software. The math doesn't work
That's the gap Future AGI was built to close. And today, it's yours.
The entire stack. UI. Backend. Simulation. Evals. Optimization. Guardrails. Gateways. Everything you need to build agents that actually stress test themselves and improve from production data, autonomously.
Because the real unlock isn't better observability, it's closing the loop where:
Agent fails → system simulates why → runs evals → generates fix → validates on real traffic → deploys → monitors for regressions.
Why open source? Because asking you to trust a closed system to autonomously improve your AI is absurd. You need to see the learning mechanisms. Inspect what's changing. Validate the optimization strategies.
This is bigger than one company. Self-improving AI will define the next decade. And it starts with infrastructure everyone can build on.
→ GitHub link below. Star it. Run it. Push it to its limits.
You’ve curated the sources. You’ve researched everything. You know exactly what you want to say.
You just can't get it out of your head and onto the page. We are building Almanac exactly for this.
Experience the beta version here: https://t.co/F7Q7Pfgz7y
Here are a few things you can do with Almanac👇
All of this is offered at ultra-low latency for real-time production interception. We have open-sourced our text adapters at HF: https://t.co/hD6gVpA84Y
We're just getting started!
[n/n]
For the last year, friends kept telling me my feed is all product updates, a big shift from my research posts.
We’ve been quietly brewing at @FutureAGI_ .
Last month, we announced AgentCompass, and the response was incredible.
Last week, we introduced Protect.
[1/n]
Here's a summary:
1. We unify text, image & audio safety under one framework.
2. We adopt Teacher-assisted relabeling for explainable, high-fidelity data.
3. Protect shows SOTA performance on public benchmarks, surpassing WildGuard, LlamaGuard-4, and GPT-4.1.
[3/n]
Most marketing stacks use AI. Few are truly intelligent.
Join Bhavneet Kaur (VP, AI @ C5i) in conversation with @itsjustnikhil from Future AGI to explore how to build AI-native marketing platforms that actually deliver:
– Reading the MarTech shift
– Architecting predictive layers
– Scaling with trust & speed
🗓 July 1 | 9:30 AM PT
RSVP → https://t.co/3nuS12qVUY
#FutureAGI #MarTech #GenAI #ReliableAI
Future AGI lands at @superai_conf Singapore! 🇸🇬
If you're thinking beyond benchmarks and into real-world reliability, come talk to @itsjustnikhil.
Don't miss our founder’s talk on "Building Reliable AI" at The Forum. This session will equip you with the essential tools to drive the next generation of AI using evals, observability, and intelligent guardrails to bridge the gap between capability and confidence.
Let’s talk about the future and how to build it responsibly.
After amazing participation in our first two sessions where we did a deep dive into how to setup smarter evaluations for your GenAI applications, we're back with a third one. This time we'll be discussing strategies around scaling AI engineering. Looking forward to this one!
Is your AI stack ready for agentic scale?
Join Sandeep Kaipu (Engineering Leader @Broadcom) and our founder, @itsjustnikhil, as they share a practical playbook for scaling GenAI infra, aligning with KPIs, and securing compliance.
📅 May 8 | 9:30 AM PT
🔗 RSVP → https://t.co/m1Z0iJ9lw0
#FutureAGI #AIInfrastructure #EnterpriseAI #AIAgents
With amazing participation in our 1st webinar, we're back with another one. This time, we're getting our hands dirty diving deep into the many layers of AI evaluation. We'll cover practical techniques to spot issues early, enhance data quality, & build more reliable GenAI apps.
Everyone talks about building AI. Few talk about evaluating it well.
Future AGI is hosting a webinar on AI evaluation—catching issues early, refining datasets, and ensuring trustworthy GenAI.
📅 April 4 | 9:30 AM PT | Online
🔗 Register here → https://t.co/E2SwZpl9sd
Let's talk a bit about authorship order on a paper. Yes, everyone cares about it, and it can become very emotional. Even if you think that big professors don't care, they do (although I know two professors who don't—you can probably guess who).
Excited for everyone to try them out (https://t.co/YI8d8ZJxsC), and always eager to hear your thoughts and feedback. We are just getting started 🚀 . [n/n]
Thrilled to see our work recognized by Forbes (https://t.co/tDECEB6gWT). Grateful to be part of this incredible team led by @itsjustnikhil and Charu! [1/n]
At @FutureAGI_, we have developed state-of-the-art evaluation methods tailored for real-world business use cases. Some of my favorite features include error localization in input data, and a robust framework for synthetic data generation, among others. [3/n]
Wanted to share a thought I had for training LLMs after reading the Deepseek R1 paper. At a high level, this can help inculcate feedback coming from humans/models/rules into the training process dynamically as and when required.
More details here: https://t.co/NSYL6HRyii