A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape (@stevenlevy / Wired)
(Visit Techmeme dot com for the link and full context!)
Amazon and @Stanford researchers collaborated to develop cvc5, an open-source software tool that powers Automated Reasoning checks in Amazon Bedrock and other AWS services. The tool now processes ~1B solver calls daily to enhance security for customers: https://t.co/xPVpUgFPbw
Before an AI agent can book your vacation, it must learn to scroll, click, tab, and navigate other low-level tasks. Amazon's AGI Lab is building "reinforcement learning gyms" where agents practice atomic behaviors, mastering mundane interactions that underpin reliable software operation: https://t.co/q0EKQyYYAU
Breaking💥Introducing OpenNovelty: An agentic system engineered to redefine how we evaluate academic novelty!
In an era of exponential @arxiv growth, expecting human reviewers to recall every related work is becoming impossible. We need more than human memory. We need a trustable, verifiable agentic system.
That's why we're here:
The 10 most viewed publications from Amazon researchers in 2025 include foundation model safety frameworks, formal verification at cloud scale, advanced robotics, and multimodal AI reasoning. https://t.co/L357g7ympr
Earlier this year, 24 year-old @aliniikk was running an AI recruiting startup. @micro1_ai pivoted into AI training, and in 8 months the company is now making $100 million a year, and fielding offers at a $2.5 billion valuation.
https://t.co/Us7BjiikG5
How do we get AI models to think more flexibly—like humans do?" Cognitive scientist @drperszyk is at NeurIPS this week talking about the frontier of adaptive intelligence and social world models. Learn more about her team's work at Amazon's AGI Lab: https://t.co/gqryKFTm6R #NeurIPS2025
Netflix quietly removes support for casting from its mobile app to most modern TVs and streaming devices, including Chromecast, regardless of subscription plan (@adamya_s / Android Authority)
https://t.co/d8eyNpgs2W
https://t.co/7pKX8dUIka
Whether your are in academia or at a frontier lab, this is a pivotal moment to re-define the new standard for LLMs. Next token prediction needs a revamp in this age of research.
At this year's #NeurIPS2025, my students will be presenting 4 papers at the main conference, including 3x spotlights on diffusion LLMs. Details in 🧵below.
Realizing this vision in the real-world requires top-notch research, focus, and execution at scale. In under a year, our team @_inception_ai released Mercury, a frontier dLLM with latencies that far surpass frontier LLMs. If you are attending NeurIPS and this vision excites you, DM me and a member of our team will reach out.
🚀 Thrilled to launch DeepScholar, an openly-accessible DeepResearch system we've been building at Berkeley & Stanford.
DeepScholar efficiently processes 100s of articles, demonstrating strong long-form research synthesis capabilities, competitive with OpenAI's DR, while running up to 2x faster!
Try it out: https://t.co/f581krydQh
Ilya Sutskever just said that when it comes to AI models, we are back at the age of research & ending the age of scaling.
What he is telling us is that more compute at this point won't help us get much better models; we need new breakthroughs.
Not something that the semi companies like $NVDA, $AMD want to hear TBH.
Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better.
I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was painfully slow. We wanted to see if an agentic workflow can help researchers iterate faster.
When we trained the system on ICLR 2025 reviews and measured Spearman correlation (higher is better) on the test set:
- Correlation between two human reviewers: 0.41
- Correlation between AI and a human reviewer: 0.42
This suggests agentic reviewing is approaching human-level performance.
The agent grounds its feedback by searching arXiv, so it works best in fields like AI where research is freely published there. It’s an experimental tool, but I hope it helps you with your research.
Check it out here: https://t.co/n7ctnDilJJ