LinMem write-up is out 🧠⚡
Agent memory should not stop at “vector DB glue”.
If agents keep getting longer tasks, richer trajectories, and more persistent context, we need memory mechanisms that are more native to sequence modeling itself 🔁
This write-up explores:
🧩 linear attention as parametric long-term memory
🔍 softmax-style reasoning for deciding when to query memory
📚 memory as a trainable mechanism, not just retrieval middleware
🤖 how this could fit future agent loops and long-horizon workflows
The key bet: the next useful agent memory layer may look less like a folder of embeddings, and more like a learnable state system ⚙️🧠
Notes + links:
https://t.co/zcnyxoNfgP
Agents need their own lane to use real Mac apps without stealing your mouse.
Open Claudex Computer Use is an open-source macOS execution layer for Claude Code, Codex, and MCP agents: AX + screenshots + virtual cursor.
https://t.co/Ndvicdsybe
Agent-native Slides best practice is out 🎞️🤖
Most AI slide failures are not “model failures”.
They are workflow failures: messy source material, no mother draft, fragile PPT editing, and no reproducible path from idea → deck 🧱
My current best practice:
📝 Feishu Docs + lark-cli as the mother draft layer
💻 Slidev for code-first, version-controlled decks
🧩 Agent skills / MCP tools for structured generation
📦 PPTX only at the final delivery/export stage
The key bet: in the Agent era, slides should be treated like software artifacts.
Readable source, editable structure, reproducible builds, and human review at the right nodes ⚙️✨
Full write-up:
https://t.co/HPs6Jsz8a4
📝🚀 Introducing OpenReview Agent
Submitting or resubmitting papers through OpenReview often involves a lot of repetitive but high-risk work:
author profiles, venue-specific fields, keywords, code/data links, LLM usage declarations, checklists, reviewer suggestions, anonymity rules, and schema requirements.
These details are easy to underestimate — until something is missing, mismatched, or formatted incorrectly near a deadline 😵💫
So we built **OpenReview Agent**:
an AI agent skill + CLI toolkit for safer, dry-run-first OpenReview submission workflows.
The goal is not to blindly automate paper submission.
The goal is to help researchers and AI agents inspect, validate, transfer, and prepare submission payloads safely before anything is written to OpenReview.
It can help with:
🧩 Inspect existing submissions
👤 Match author OpenReview profiles
🔁 Plan cross-venue transfers
📋 Validate target venue schemas
🧪 Generate dry-run submission payloads
📦 Batch-create submission drafts
🔐 Catch author ID, anonymity, schema, typo, and formatting issues early
The workflow is deliberately conservative:
🔍 inspect first
🧭 plan second
🧪 dry-run third
✅ apply only with explicit confirmation
🔁 verify after writing
We hope this can reduce low-value procedural work and let researchers spend more attention on the paper itself.
Still early: `0.1.0-alpha`.
💻 Code:
https://t.co/Uoyth07Os8
Feedback, issues, and contributions are very welcome 🙌
#OpenReview #NeurIPS #ECCV #AIagents #ResearchTools #AcademicPublishing #PeerReview #OpenSource
🖥️ Giving GUI Agents a background lane on macOS
We recently open-sourced a new project:
📌 Open Claudex Computer Use 🖥️✨
An open-source Computer Use MCP Server for macOS.
The problem we wanted to solve is very concrete:
For many GUI agents, the biggest pain point is not whether they can click buttons.
It is that once they start operating your computer, they take over your foreground screen, mouse, and keyboard.
That creates an awkward situation:
When the agent is using my computer,
I can no longer use my computer.
Humans and GUI agents cannot truly coexist this way.
So with Open Claudex Computer Use, we are trying to give agents a relatively independent GUI “background lane” 🚗��
The agent can operate real macOS apps in the background:
read app state, observe screenshots, click, type, scroll, drag, and interact with UI elements.
At the same time, it shows what it is doing through a virtual cursor, instead of directly hijacking your physical mouse.
In other words:
The agent no longer has to stand in the middle of your foreground screen to get work done.
You can keep doing your own work, while the agent operates real apps on another “track”.
This initial version includes:
🧩 macOS app state reading
📸 Screenshots and visual observation
🖱️ Click / scroll / drag
⌨️ Text input and keyboard actions
🧭 Virtual cursor visualization
🛠️ Claude Code / Codex / MCP client integration
🍎 Support for Safari, Notes, Finder, TextEdit, Calculator, and other real Mac apps
More importantly, the community has been missing an open-source macOS Computer Use execution layer.
Official computer-use capabilities are not fully open-source, so we built an open implementation that developers can try, modify, and plug into their own agent workflows.
The project is currently at 0.1.0-alpha, so it is best suited for developers, MCP builders, AI agent researchers, and macOS automation enthusiasts who want to experiment early.
💻 Code:
https://t.co/ekOg6GawbB
If you believe future agents should not only answer questions, but actually use computers together with humans in a more cooperative way, we would love for you to try it, open issues, and share feedback 🙌
#OpenClaudex #GUIAgent #MCP #ComputerUse #AIAgent #macOS #OpenSource
Claw-Eval-Live is out 🦞 — a live extension of the Claw-Eval Family!
This live release includes:
105 tasks | 17 workflow families | 13 frontier models tested | quarterly refresh from real ClawHub marketplace signals.
Instead of relying on a static task set, Claw-Eval-Live keeps agent evaluation aligned with evolving real-world enterprise workflows.
A key finding: the bottleneck is not terminal use or environment setup, but cross-system business workflows that require evidence-grounded execution.
Built together with @_TobiasLee and the Claw-Eval core author team, extending the Claw-Eval family into a live benchmark for evolving real-world workflows!
🤗 HF Paper: https://t.co/ZRsFBflS61
Arxiv Paper: https://t.co/lXwJootTve
Leaderboard: https://t.co/SGvtrDt3iU
Code: https://t.co/nDOSIuAub0
🧵 Here are our findings:
🦋DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Excited to share that our work in the #NeurIPS2025 !
- A large-scale 4D + instance + semantics + caption dataset with 100K in-the-wild scenes, supporting 4D world modeling by combining classic 3D reconstruction with feed-forward methods.
- A novel automated data curation pipeline designed to generate physically-aware multi-modal 4D data at scale.
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models
Contributions:
• We propose DIFF4SPLAT, a unified diffusion-based model that directly generates deformable 3D Gaussians for controllable 4D scene synthesis.
• We construct a large-scale 4D dataset from synthetic and in-the-wild videos, annotated with appearance, metric-scale geometry, and motion.
• Extensive experiments demonstrate that DIFF4SPLAT produces high-fidelity 4D scenes from a single image, outperforming two-stage pipelines and existing camera-controlled video generation methods in both quality and efficiency.
Mark Zuckerberg on the best advice Peter Thiel ever gave him
“Peter was the person who told me this really pithy quote that, ‘In a world that’s changing so quickly, the biggest risk you can take is not taking any risk.’ And I really think that that is true.”
Mark continues:
“Whenever you get yourself into a position where you have to make some big shift in direction or do something, there are always people who are going to point to the downside risks of that decision — and locally they may be right. For any given decision you make, there’s upside and downside. But in aggregate, if you are stagnant and you don’t make those changes, then I think you’re guaranteed to fail and not catch up. So to some degree, I think it’s really right that, over time, the biggest risk you can take is to not take any risks.”
Video source: @ycombinator (2016)
I really like the term “context engineering” over prompt engineering.
It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.
+1 for "context engineering" over "prompt engineering".
People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits.
On top of context engineering itself, an LLM app has to:
- break up problems just right into control flows
- pack the context windows just right
- dispatch calls to LLMs of the right kind and capability
- handle generation-verification UIUX flows
- a lot more - guardrails, security, evals, parallelism, prefetching, ...
So context engineering is just one small piece of an emerging thick layer of non-trivial software that coordinates individual LLM calls (and a lot more) into full LLM apps. The term "ChatGPT wrapper" is tired and really, really wrong.
Diffusion video models but now - **realtime**!
Simple video filters are real-time but can only do basic re-coloring and styles. Video diffusion models (Veo and friends) are magic, but they take many seconds/minutes to generate. MirageLSD is real-time magic. Unlike simple video filters, diffusion models actually *understand* what they are looking at, so they can style all parts of the feed intelligently (e.g. putting hats on heads, or light sabers into hands, etc.). And they are arbitrarily steerable, e.g. by text prompts.
Customizable, intelligent video filters unlock many cool ideas over time:
- transform camera feeds into alternate realities
- direct and shoot your own movies, acting out scenes with props. Realtime => instant feedback/review.
- vibe code games around just simple spheres/blocks, then use a real-time diffusion model to texture your game to make it beautiful.
- style and customize any video feed: games, videos, ... e.g. Skyrim but "MORE EPIC"? DOOM II but modern Unreal Engine quality with just a prompt? Horror movie but "cute, pink and bunnies only"? I don't know!
- zoom call backgrounds+++
- real-time try on clothes virtually
- glasses: e.g. cartoonify your vision in real time?
- we can now build Harry Potter Mirror of Erised, showing the "raw feed" of you in the mirror but augmented with your deepest desires (as inferred by the AI).
- I don't know, I'm probably missing the biggest one, so many things!
(Disclosure I am (very small) angel investor in Decart, I was excited because imo this technology will get very good very fast and it feels general, powerful but it's also technically very difficult. Congrats on the launch to the team!)
Finished your project homepage? 🚀
Next step: create an app for everyone to experience! With Anycoder by @_akhaliq , you can instantly generate Python Gradio apps. 💻✨
Start building now: https://t.co/aWUfb8lgeK
#Gradio#AnyCoder#vibecoding