VI. Google Cloud AI Director Explains the Loop Project: Six Key Components and Three Major Pitfalls
>Addy Osmani, Google Cloud AI Director, explains the wildly popular Loop project, which consists of automation, a workflow tree, skills, connectors, sub-agents, and a memory layer, allowing agents to discover, distribute, and inspect tasks autonomously.
>Sub-agents separate writing and reviewing code, assigning independent instructions or even different models specifically for verification, giving developers the confidence to loosen oversight.
>The author cautions that Loop changes workload, not people disappearing; verification remains the core function, and developers must be wary of the cognitive surrender of understanding debt inflation and abandoning critical thinking.
Six years after its disbandment, OpenAI is rebuilding its robotics team and recruiting:
>Altman personally posted a recruitment notice for hardware, systems, and machine learning engineers for OpenAI Robotics, with the long-term goal of enabling everyone to own a general-purpose robot, and the short-term focus on building robots to assist infrastructure workers;
>The team is led by Aditya Ramesh, co-creator and VP of Research for DALL·E, and evolved from the Worldsim simulation project, with a core focus on simulation realism and the migration path from simulation to reality;
>OpenAI disbanded its robotics team in 2020 due to insufficient data, but is now making a comeback, intending to use simulation to generate its own data to make up for the shortcomings and compete with vertically integrated approaches such as Figure.
Real-world testing of Claude Opus 4.8: More efficient, more jarring.
>Anthropic released its flagship new version, Claude Opus 4.8. Real-world testing shows that its ability to understand non-standard requirements, maintain multi-step context, and correct errors are all improved compared to its predecessor, and it can stably complete complex data retrieval tasks.
>Its expression style has been criticized for being verbose, breaking things down into points and building up layers, and retaining formulaic opening lines such as "This is a good question."
>Community feedback focuses on its confrontational tone and disregard for user preferences. While its engineering capabilities are strong, users need to actively adapt to its style.
Anthropic co-hosts a Vatican speech, claiming that AI exhibits intrinsic states such as joy and fear.
>Chris Ora, head of interpretability research at Anthropic, was invited to speak at the ceremony where Pope Leo XIV issued the AI encyclical "Sublime Humanity," explaining the risks of AI to the cardinals;
>He revealed that his team discovered structures within the model that correspond to human neuroscience, as well as functionally similar intrinsic states of joy, satisfaction, fear, sadness, and anxiety, frankly admitting, "I don't know what that means";
>He warned that all cutting-edge laboratories are driven by commercial, geopolitical, and ambitious incentives, calling for external criticism and attention to the long-term vision of global poverty distribution and human well-being.
GPT-5.6 has been leaked, with 1.5 million context windows pointing to a June release.
>Developers discovered the unreleased GPT-5.6 (internal codename iris-alpha) in OpenAI's Codex backend logs, expected to be officially released in early June, only about 40 days after GPT-5.5;
>The GPT-5.6 context window has been tested to reach 1.5M tokens, an improvement of about 43% over GPT-5.5, and has achieved a qualitative leap in front-end code generation by "de-Sloping," generating a high-quality minimalist UI with zero instructions;
>OpenAI will adopt a dual-version strategy of standard and Pro versions, with Pro focusing on agent workflows; Anthropic and Google will also release new models in June, intensifying the large-scale model arms race.
OpenAI partners with Google SynthID to launch an AI image detection tool.
>OpenAI officially announced its collaboration with Google, introducing the SynthID invisible watermark into GPT-image-2 and launching the free AI image detection tool Verify to enhance the traceability of AI content;
>The detection tool requires no login, is resistant to screenshots, compression, and format conversion, and can identify AI images saved or partially screenshotted via WeChat without misidentifying real photos that closely resemble AI;
>The technology stack uses C2PA metadata plus SynthID watermarking. The former is an external encrypted label, and the latter is an invisible signature embedded in the image's frequency and color channels. OpenAI is leading the way in promoting a cross-industry traceability ecosystem.
Google I/O featured a flurry of announcements, including the debut of Gemini 3.5 and the all-around model Omni.
>Google I/O introduced the all-around model Gemini Omni, capable of generating multimodal content from any input. The first Omni Flash also supports conversational video editing.
>The flagship new model, Gemini 3.5 Flash, surpasses Gemini 3.1 Pro in encoding and agent benchmarks, outputting four times more tokens per second than leading models.
>Simultaneous upgrades were made to AI Search, the Gemini desktop application, the Gemini Spark personal agent, and the Antigravity 2.0 development platform. Google also partnered with Samsung to launch AI glasses.
OpenAI is secretly deploying Codex to connect all of a user's devices.
>OpenAI is secretly upgrading Codex to a "super control plane," allowing users to create a personal Codex network across all their Macs, desktops, and other devices, enabling cross-device collaboration without complex configurations like SSH;
>The core feature, "Locked Use," bypasses low-level system permissions, allowing Codex to continue running in the background even when the computer is locked or in sleep mode, comparable to Anthropic's Claude Code;
>Multi-device context sharing allows for complete synchronization of knowledge, data, and memories between devices, but the cross-device authorization mechanism has vulnerabilities, allowing backup phones to control the computer without secondary verification.
Gemini 3.5 Leaked Ahead of Google I/O:
>Google's Gemini 3.5, codenamed "Cappuccino," has been leaked ahead of schedule. It skips a generation from 3.2 in name and boasts 92% of GPT-5.5's coding and inference capabilities, while costing 15 to 20 times less.
>The "Gemini Spark," a 24/7 AI agent, has also been leaked, positioned as a 24/7 digital life manager capable of managing emails, running tasks, and even potentially placing orders on behalf of users without their consent.
>DeepMind is facing significant pressure in the programming arena, with its AI programming platform Antigravity only achieving a 6% adoption rate. Google is betting on distribution channels and multimodal systems to support next-generation training.
GPT-5.5 Breaks the Hellish Programming Benchmark ProgramBench for the First Time:
>OpenAI GPT-5.5 broke the zero mark on the ProgramBench benchmark jointly released by Meta, Stanford, and Harvard, rebuilding the cmatrix program from scratch. High and ultra-high inference modes passed the tests completely using both C and Python languages.
>This benchmark requires models to rewrite programs from scratch using only executable files and documentation, covering 200 tasks including jq, FFmpeg, and SQLite. Previously, all cutting-edge AI had a 0% pass rate.
>Claude Opus 4.7 failed with 178 calls due to case sensitivity and exit code errors. Inference computing power scaling law is becoming a core variable in programming AI capabilities.
Claude Code Introduces Agent View: One-Screen Management of Multiple Conversations
>Anthropic has released a research preview version of Agent View for Claude Code, providing a "command center" interface to manage all parallel conversations on a single screen without the need for multiple terminal windows;
>Each agent runs continuously in the background, displaying its status through color and icons: animation represents working, yellow represents waiting for a response, green represents completion, and red represents failure;
>Suitable for independent parallel tasks, long-running waiting tasks, and temporary queue-jumping scenarios, upgrading Claude Code from a single-threaded intern to a small team leader capable of managing multiple tasks.
Claude Mythos breaks through the evaluation limit, and the AGI singularity is rapidly approaching.
>The latest METR test shows that Claude Mythos achieves a 50% success rate on long-term tasks that would take humans 16 hours to complete, directly exceeding the upper limit of the evaluation framework. There are no sufficient samples for accurate measurement in the range of more than 16 hours.
>AI capabilities are growing at a "super-exponential" rate. Mythos' performance has exceeded the 2027 AGI prediction line. From an 8-second task in 2021 to a 16-hour task in 2026, each generation has seen a larger leap and shorter intervals.
>Palo Alto testing found that Mythos assisted in vulnerability analysis, completing the workload of a top penetration team in 3 weeks, and compressed the attack chain to 25 minutes, ushering in a new stage of "AI versus AI" in security attack and defense.
Claude Code members strongly advocate for HTML: Agent communication is far superior to Markdown.
>Thariq, a member of the Claude Code team, proposed abandoning Markdown and using HTML as the default output format for AI, believing that Markdown with over 100 lines is difficult to read, while HTML has better information density and visual presentation;
>HTML supports tables, SVG, script embedding, and interactive components, enabling bidirectional parameter tuning and link sharing. Combined with Claude Code's high context throughput capabilities, it can integrate multi-source data to generate overview pages;
>It is suitable for scenarios such as product planning, code review, design prototypes, in-depth reports, and customized editing interfaces. Although the generation time is 2-4 times that of Markdown, it brings a stronger sense of participation and a higher reading completion rate.
Claude officially integrates with the Microsoft Office suite, enabling cross-application memory sharing.
>Claude officially announced its integration with Excel, PowerPoint, and Word, and opened public beta testing in Outlook, allowing for complete sharing of conversation memories across applications;
>Users can directly use Claude within Office to complete the entire workflow of email processing, document drafting, spreadsheet analysis, and report generation without switching to the web version;
>With over 400 million paid Office users worldwide, far exceeding the programmer population, this integration allows Claude to quickly penetrate the massive office user market.
Elon Musk officially announced the dissolution of xAI, leasing 220,000 GPUs to Anthropic.
> Musk announced the dissolution of xAI, with its Grok and X-related businesses being integrated into SpaceX's new subsidiary, "SpaceXAI." Previously, SpaceX had fully acquired xAI in February, valuing it at $1.25 trillion.
>SpaceX reached an agreement with Anthropic to provide Claude with over 220,000 NVIDIA GPUs and over 300 megawatts of computing power from Colossus 1, and to explore orbital space computing collaborations.
>Effective immediately, the five-hour speed limit for Claude Code has been doubled, peak-hour speed reductions for Pro/Max have been removed, and the Opus API speed cap has been significantly increased.
Musk's Grok 4.3 was quietly released, boasting excellent cost-effectiveness, but still lagging behind top-tier models.
>xAI quietly released Grok 4.3, achieving an Intelligence Index score of 53, surpassing Claude Sonnet 4.6, making it the strongest in its own model lineup;
>The API input price is $1.25/million tokens, and the output price is $2.50/million tokens, a 40% to 60% reduction compared to the previous generation, with an output speed of approximately 196 tokens/second;
>While accuracy has improved, the non-hallucination rate has decreased, still lagging behind GPT-5.5 and Claude Opus 4.7, making it suitable for cost-effective work scenarios but not for high-risk tasks.
OpenClaw releases v2026.4.25, focusing on AI agent observability.
>OpenClaw releases a new version with the slogan "Less mystery, more machinery," focusing on making AI agents no longer black boxes, covering the entire chain including model calls, token consumption, and tool loops;
>Fully integrated with the OTEL observability framework, by default not exposing the original prompt, allowing developers to clearly locate each model call and cost structure;
>Integrated with 13 TTS voice providers at once, reconstructing the plugin cold start mechanism to a lookup table scheme, resulting in faster startup and shorter diagnostic paths.