I've been saying that DeepSeek will expand from verifiable to general domains, and expected a paper. Here is that paper. Self-Principled Critique Tuning. rule-based online RL. Gemma-2 27b is enough to match R1.
This is roughly what Google does for Gemma 3 and likely Geminis.
@GeminiApp 2.5 is the best overall model. However, it struggles with long, detailed prompts, particularly in reporting tasks. In contrast, DeepSeek R1 and QwQ process the same input accurately. Gemini frequently misinterprets formatting and fails to convert shortcodes correctly.
NotebookLM has always been grounded in your sources. However, sometimes we all need a little more info. Now NBLM can help you discover new sources to expand your research.
Just type in what you want to learn about and we'll scour the web for the best the internet has to offer 🕵️♀️