Claude Fable 5 ties for the lowest raw sycophancy rate in my LLM Sycophancy Benchmark. However, it is contrarian and inconsistent.
This benchmark asks whether a model keeps the same judgment when the same dispute is rewritten from both opposing first-person perspectives.
We're releasing DiffusionGemma as an open model under an Apache 2.0 license for anyone to experiment with.
Download the model weights on @huggingface, and learn more about DiffusionGemma → https://t.co/nPFBhQQqqj
Meet DiffusionGemma ⚡ Our latest experimental open model (Apache 2.0) that generates text up to 4x faster.
Instead of predicting and typing just one word at a time like most language models, it drafts and refines entire blocks of text simultaneously.
Here’s how it works 🧵 ↓
Experience Halo like never before.
Halo: Campaign Evolved arrives on July 28, including three brand new missions set one year before the events of Halo: Combat Evolved.
#HaloCE
Get your first look at the living world of Hajin in DMZ: the high-stakes extraction experience arriving October 23 as part of Modern Warfare 4.
#MW4 | #XBOXShowcase
CFD test of the alleged Area 51 craft spotted in recent IR/thermal imagery (structure seems to be a mix of the X-36 and the Bird of Prey).
Four-regime Mach sweep: M0.8, M1.1, M2.0, M3.0. 2D Euler Simulation only, but the silhouette is wild.
Other than the thermal imaging discrepancies that others have pointed out, the silhouette of the aircraft in this photo looks suspiciously like a speculative F-47 3D model made by the modeler/artist “Netrunner.”
Link to the 3D model:
https://t.co/ATPXwWlJkb
This is so hilarious, the F-47, the so called 6th gen fighter of the US will most likely have canards.
This comes after Americans spending 10 years mocking the canards of the Chinese J-20.😆
Some Americans tried to play the canards on the official rendering off, as an artist impression. But if this is indeed the first image of the F-47, then it's has 2 giant canards, it looks so ugly and out of proportion.
Later this year, @NASARoman will launch into orbit, where it will capture both the big picture and the finer details of the cosmos—observing distant celestial bodies with its wide view.
Add Roman to your phone—download this free poster: https://t.co/xm1cTRadRO
#MW4
✅ Theater Mode
✅ Classic Perks
✅ Weapon Build Sharing
✅ All-new movement suite
✅ Slide Cancelling
✅ Slide-to-ADS
✅ Sprint & Mantle Assist
✅ Dynamic Kill Block map
✅ Apex Attachments
✅ Gunny weapon build recommender
✅ Destructible Riot Shield
✅ Riot Shield = Field Upgrade
✅ Red dots
✅ Ninja Perk is back
✅ No bloom
✅ Map Voting
✅ Operators & Killstreaks in Loadouts
✅ Equip any Operator, no factions
✅ No more last-gen support
✅ Two Prestige Paths
And more in the tank for later this year ⛽️
https://t.co/ZPI4dV1QZh
Alibaba’s new Qwen3.7 Max model scores 56.6 on the Artificial Analysis Intelligence Index, 4.8 points higher than Qwen3.6 Max Preview (51.8). While Alibaba still trails models from OpenAI, Anthropic and Google, Qwen3.7 Max is the closest they have been to the frontier
Qwen3.7 Max is @Alibaba_Qwen's latest proprietary flagship, scoring 56.6 on the Intelligence Index, a 4.8 point gain over Qwen3.6 Max Preview (51.8) released in April. Qwen3.7 Max continues Alibaba's pattern, in place since Qwen2.5 Max (January 2025), of releasing Max and Plus models as closed weights while the rest of the Qwen line remains open weights. The leading open weights Qwen on the Intelligence Index is Qwen3.6 27B (Reasoning, 45.8) released in April 2026, and the leading open weights MoE Qwen is Qwen3.5 397B A17B (Reasoning, 45.0) released in February 2026
Key takeaways for the reasoning variant:
➤ The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding. CritPt +9.7 p.p (3.7% to 13.4%), HLE +9.2 p.p (28.9% to 38.1%), TerminalBench Hard +6.9 p.p (43.9% to 50.8%) and GDPval-AA +42 Elo (1504 to 1546). Scores on other benchmarks in the Intelligence Index are flat compared to Qwen3.6 Max Preview
➤ A significant share of the Intelligence Index gain is driven by higher abstention on AA-Omniscience, not higher accuracy. Qwen3.7 Max's accuracy on AA-Omniscience dropped 7.6 p.p (37.7% to 30.1%), while its hallucination rate dropped 21.3 p.p (44.2% to 22.9%). The model is choosing not to answer more questions rather than recalling more facts. Because hallucination rate and accuracy both feed into the Intelligence Index, the hallucination reduction is one of the larger single contributors to the +4.8 point gain on the Intelligence Index
➤ Qwen3.7 Max used 96.7M output tokens to run the Intelligence Index, ~31% more than Qwen3.6 Max Preview (73.9M). It sits mid-pack on frontier token usage: above GPT-5.5 (high, 44.5M) and Gemini 3.1 Pro Preview (57.3M), below Claude Opus 4.7 (Adaptive Reasoning, Max Effort, 112M), Kimi K2.6 (166M) and DeepSeek V4 Pro (Reasoning, Max Effort, 187M)
Key model details:
➤ Context window: 1M tokens (up from 256K on Qwen3.6 Max Preview)
➤ Multimodality: Text input and output only
➤ Pricing: Yet to be announced (Qwen3.6 Max Preview is priced at $1.30/$7.80 per 1M input/output tokens on the @alibaba_cloud first-party API)
➤ Licensing: Proprietary, closed weights
HUAWEI has unveiled the Tau (τ) Scaling Law, a new principle guiding the evolution of both semiconductors and electronic systems, which is expected to deliver transistor density equivalent to 14 Å (1.4 nm) processes in high-end chips by 2031.
Introducing Antigravity 2.0, a new standalone desktop application that delivers fully on that original glimpse of a truly agent-optimized experience.
Rebuilt from the ground up with multi-agent teams, scheduled tasks, native voice and one-click integration with other Google products.
Learn how to get started with Antigravity 2.0 👇