When doing research +10% nDCG@10 over a baseline is an excellent result, when building a search product we aim for 0.9+ nDCG@10.
Research is about relative improvements while product quality is absolute.
Also, great search is different from IR research.
When building a search engine for users where every irrelevant result can lead to churn, we learn from the past to predict the future. When doing research we learn to generalise to an unseen part of the data distribution.
Fun observation.
My take is that it’s easier than ever but it never gets easy to build great search. Less hard.
Take ms marco passage ranking as one example, where SOTA is around getting the right passage at #1 for 50% of test queries (MRR is about 0.5 if I recall correctly).
That is in domain with massive amount of training data for the same domain. Still that is a lot better than unsupervised baselines. So researchers are right to celebrate progress but at the very same time it’s incredibly difficult to build great search for real user queries.
Thankfully agents makes it easier than ever before, but it’s still not easy.
🚨📢 GROSSE ANNONCE 🚨📢
Vous avez peut-être vu passer l’information ces dernières semaines dans les médias ou sur les réseaux sociaux, on vous en parle plus en détail aujourd’hui 👇
Huit mois après l’annonce de sa création basée sur l’alliance stratégique entre les moteurs de recherche européens @Qwant_FR et @ecosia, European Search Perspective annonce la création de son API de recherche nommée Staan (Search Trusted API Access Network).
Staan, qui se prononce « Stane », fournit déjà près d’une requête sur deux sur Qwant, en France. Ecosia et @Lilo__org (récemment acquis par Qwant) commencent également la mise en œuvre en langue française dans un premier temps. En parallèle, les équipes de European Search Perspective travaillent à l’extension du service en allemand et en anglais dans le but d’internationaliser rapidement l’API de recherche Staan.
➡️ L’objectif ? Proposer une alternative technologique, souveraine et crédible aux entreprises européennes.
Staan s’adresse à la fois aux moteurs de recherche alternatifs et aux acteurs de l’intelligence artificielle, qui nécessitent une solution d’accès en temps réel aux données web les plus fraîches et les plus pertinentes. Grâce à l’expérience acquise depuis un an par Qwant sur le potentiel d’une #IA construite sur des résultats de recherche, cette API permet à la fois d’obtenir les documents du web les plus pertinents pour une recherche donnée mais aussi de rédiger des réponses précises façonnées à partir de ces documents. Cela illustre la capacité de cette technologie à s’adapter aux nouveaux usages, tout en garantissant la souveraineté numérique du vieux continent en termes de confidentialité et de sécurité des données des utilisateurs et des entreprises.
Avec l’API de recherche Staan, European Search Perspective porté par Qwant et Ecosia confirme qu’une autre voix est possible. Plus européenne, plus souveraine, plus responsable… Le pluralisme d’accès à l’information via la recherche sur le web doit être une priorité pour nous toutes et tous. Voilà ! On avance étape par étape toujours avec la même envie, toujours avec la même ambition.
De nouvelles annonces arriveront à la rentrée, en attendant, toute l’équipe vous souhaite un très bel été 2025 ☀️
#qwant bonnes nouvelles après 2 ans de dev !
La team a fini le développement du moteur de recherche indépendant de Bing. La version en FR est en production sur https://t.co/hfK2FHHrDQ et répond à 50% de requêtes de visiteurs. Il est en cours d'intégration sur https://t.co/xF18IxlvbI et https://t.co/1dSkhcTiT6 pour lez visiteurs FR.
La team travaille sur l'optimisation de coûts d'infra pour avoir 20x de pages dans l'index et ainsi servir 100% de requêtes en FR. On a besoin encore un peu de temps pour y arriver mais c'est sur la bonne voie ! Super taf la team !
La team va aussi commencer le développement de la version DE avec l'aide de nos amis d'Ecosia. Puis on attaquera la version EN/US.
Une API du moteur de recherche qu'on a développé, sera disponible bientôt:
- pour créer un moteur de recherche alternatif
- pour intégrer le moteur de recherche dans les chatbots AI.
Un "chat" AI web & mobile viendra à l'entrée
proposer une alternative dans l'AI. Gratos.
L'audience augmente légèrement chaque mois. On recrute. L'interface évolue ..
C'est un exemple de la souveraineté technologique qui permet de répondre aux enjeux qu'on a en Europe.
https://t.co/UnH1mjJUok
📢 ANNONCE QWANT NEXT 📢
La Beta de notre nouveau service Qwant Next est désormais disponible ! ✨
Venez tester et nous aider à construire notre nouvelle expérience de recherche.
Plus de détails dans ce thread. 👇 1/4
@jobergum Stakeholders sometimes don't understand that "ranking is bad" may not be entirely due do the actual ranking component, and that effort should be put elsewhere. Data quality is of the utmost importance. As always: garbage in, garbage out.
It has never been about the cost of training, the cluster scale, or the quality of the team: it’s about openly sharing AI knowledge to enhance reproducibility and collective progress.
Great piece by @Thom_Wolf
Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude*
The first half of the essay reads like a lengthy attempt to justify that closed-source models are still significantly ahead of DeepSeek. However, it mostly refers to internal unpublished evals which limit the credit you can give it, and statements like �� DeepSeek-V3 is close to SOTA models and stronger on some very narrow tasks » transforming in a general conclusion « DeepSeek-V3 is actually worse than those US frontier models — let’s say by ~2x on the scaling curve » left me generally doubtful. The same applies to the takeaway that all discoveries and efficiency improvements of DeepSeek have been discovered long ago by closed-models companies, this statement mostly resulting from a comparison of DeepSeek openly published $6M training numbers with some vague « few $10M » on Anthropic side without providing much more details. I have no doubts the Anthropic team is extremely talented and I’ve regularly shared how impressed I am with Sonnet 3.5 but this longwinded comparison of open research with vague closed research and undisclosed evals has left me less convinced of their lead than I was before I reading it.
Even more frustrating was the second half of the essay which dive into the US-China race scenario and totally misses the point that the DeepSeek model is open-weights, and largely open-knowledge due to its detailed tech report (and feel free to follow Hugging Face’s open-r1 reproduction project for the remaining non-public part: the synthetic dataset). If both DeepSeek and Anthropic models had been closed source, yes the arm-race interpretation could have make sense but having one of the model freely widely available for download and with detailed scientific report renders the whole « close-source arm-race competition » argument artificial and unconvincing in my opinion.
Here is the thing: open-source knows no border. Both in its usage and its creation.
Every company in the world, be it in Europe, Africa, South-America or the USA can now directly download and use DeepSeek without sending data to a specific country (China for instance) or depending on a specific company or server for running the core part of its technology.
And just like most open-source library in the world are typically built by contributors from all over the world, we’ve already seen several hundred derivative models on the Hugging Face hub created everywhere in the world by teams adapting the original model to their specific use cases and explorations.
What's more, with the open-r1 reproduction and the DeepSeek paper, the coming months will clearly see many open-source reasoning models being released by teams from all over the world. Just today, two other teams, AllenAI in Seattle and Mistral in Paris both independently released open-source base models (Tülu and Small3) which are already challenging the new state-of-the-art (with AllenAI indicating that its Tülu model surpasses the performance of DeepSeek-V3).
And the scope is even much broader than this geographical aspect. Here is the thing we don’t talk nearly enough about: open-source will be more and more essential for our… safety!
As AI becomes central to our lives, resiliency will increasingly become a very important element of this technology. Today we’re dependent on internet access for almost everything. Without access to the internet, we lose all our social media/news feeds, can’t order a taxi, book a restaurant, or reach someone on WhatsApp. Now imagine an alternate world to ours where all the data transiting through the internet would have to go through a single company’s data centers. The day this company suffers a single outage, the whole world would basically stop spinning (picture the recent CrowdStrike outage magnified a millionfold).
Soon, as AI assistants and AI technology permeate our whole life to simplify many of our online and offline tasks, we (and companies using AI) will start to depend more on more on this technology for our daily activities and we will similarly start to find annoying or even painful any downtime in these AI assistants from outages.
The most optimal way to avoid future downtime situations will be to build resilience deep in our technological chain.
Open-source has many advantages like shared training costs, tunability, control, ownership, privacy but one of its most fundamental virtue in the long term –as AI becomes deeply embedded in our world– will likely be its strong resilience. It is one of the most straightforward and cost-effective ways to easily distribute compute across many independent providers and to even run models locally and on device with minimal complexity.
More than national prides and competitions, I think it’s time to start thinking globally about the challenges and social changes that AI will bring everywhere in the world. And open-source technology is likely our most important asset for safely transitioning to a resilient digital future where AI is integrated into all aspects of society.
*Claude is my default LLM for complex coding. I also love its character with hesitations and pondering, like a prelude to the chain-of-thoughts of more recent reasoning models like DeepSeek generations.
I couldn't be more happy that the Search technology we have built at Qwant over the years will now be part of a joint initiative with @ecosia.
I will continue to lead our R&D efforts around both Query Understanding and Ranking, stay tuned for more!
https://t.co/uLuaidEcfn