Chenxin Li

@XGGNet

PhD student@CUHK | Agent, Multimodal LLM, World Model

Joined February 2022

674 Following

238 Followers

194 Posts

Chenxin Li @XGGNet

about 1 month ago

LinMem write-up is out 🧠⚡ Agent memory should not stop at “vector DB glue”. If agents keep getting longer tasks, richer trajectories, and more persistent context, we need memory mechanisms that are more native to sequence modeling itself 🔁 This write-up explores: 🧩 linear attention as parametric long-term memory 🔍 softmax-style reasoning for deciding when to query memory 📚 memory as a trainable mechanism, not just retrieval middleware 🤖 how this could fit future agent loops and long-horizon workflows The key bet: the next useful agent memory layer may look less like a folder of embeddings, and more like a learnable state system ⚙️🧠 Notes + links: https://t.co/zcnyxoNfgP

Chenxin Li @XGGNet

about 1 month ago

@aroido_bigcat Exactly!

Chenxin Li @XGGNet

about 1 month ago

Agents need their own lane to use real Mac apps without stealing your mouse. Open Claudex Computer Use is an open-source macOS execution layer for Claude Code, Codex, and MCP agents: AX + screenshots + virtual cursor. https://t.co/Ndvicdsybe

Chenxin Li @XGGNet

about 1 month ago

Agent-native Slides best practice is out 🎞️🤖 Most AI slide failures are not “model failures”. They are workflow failures: messy source material, no mother draft, fragile PPT editing, and no reproducible path from idea → deck 🧱 My current best practice: 📝 Feishu Docs + lark-cli as the mother draft layer 💻 Slidev for code-first, version-controlled decks 🧩 Agent skills / MCP tools for structured generation 📦 PPTX only at the final delivery/export stage The key bet: in the Agent era, slides should be treated like software artifacts. Readable source, editable structure, reproducible builds, and human review at the right nodes ⚙️✨ Full write-up: https://t.co/HPs6Jsz8a4

Who to follow

Xin Kong

@XinKong_IC

Research Scientist @nvidia Cosmos Lab, PhD @imperialcollege | ex-Research Intern @meta @GoogleARVR | World Model, GenAI, 3DV, Robotics

Jingbo Wang

@Alex_wangjingbo

Researcher@Shanghai AI Lab, PhD@MMLAB CUHK

Ying Sheng

@ying11231

Cofounder & CEO @radixark @lmsysorg | @sgl_project (https://t.co/6e9BrnaWXK) | Do it anyway | Be the light

Chenxin Li @XGGNet

about 1 month ago

📝🚀 Introducing OpenReview Agent Submitting or resubmitting papers through OpenReview often involves a lot of repetitive but high-risk work: author profiles, venue-specific fields, keywords, code/data links, LLM usage declarations, checklists, reviewer suggestions, anonymity rules, and schema requirements. These details are easy to underestimate — until something is missing, mismatched, or formatted incorrectly near a deadline 😵‍💫 So we built **OpenReview Agent**: an AI agent skill + CLI toolkit for safer, dry-run-first OpenReview submission workflows. The goal is not to blindly automate paper submission. The goal is to help researchers and AI agents inspect, validate, transfer, and prepare submission payloads safely before anything is written to OpenReview. It can help with: 🧩 Inspect existing submissions 👤 Match author OpenReview profiles 🔁 Plan cross-venue transfers 📋 Validate target venue schemas 🧪 Generate dry-run submission payloads 📦 Batch-create submission drafts 🔐 Catch author ID, anonymity, schema, typo, and formatting issues early The workflow is deliberately conservative: 🔍 inspect first 🧭 plan second 🧪 dry-run third ✅ apply only with explicit confirmation 🔁 verify after writing We hope this can reduce low-value procedural work and let researchers spend more attention on the paper itself. Still early: `0.1.0-alpha`. 💻 Code: https://t.co/Uoyth07Os8 Feedback, issues, and contributions are very welcome 🙌 #OpenReview #NeurIPS #ECCV #AIagents #ResearchTools #AcademicPublishing #PeerReview #OpenSource

Chenxin Li @XGGNet

about 1 month ago

🖥️ Giving GUI Agents a background lane on macOS We recently open-sourced a new project: 📌 Open Claudex Computer Use 🖥️✨ An open-source Computer Use MCP Server for macOS. The problem we wanted to solve is very concrete: For many GUI agents, the biggest pain point is not whether they can click buttons. It is that once they start operating your computer, they take over your foreground screen, mouse, and keyboard. That creates an awkward situation: When the agent is using my computer, I can no longer use my computer. Humans and GUI agents cannot truly coexist this way. So with Open Claudex Computer Use, we are trying to give agents a relatively independent GUI “background lane” 🚗�� The agent can operate real macOS apps in the background: read app state, observe screenshots, click, type, scroll, drag, and interact with UI elements. At the same time, it shows what it is doing through a virtual cursor, instead of directly hijacking your physical mouse. In other words: The agent no longer has to stand in the middle of your foreground screen to get work done. You can keep doing your own work, while the agent operates real apps on another “track”. This initial version includes: 🧩 macOS app state reading 📸 Screenshots and visual observation 🖱️ Click / scroll / drag ⌨️ Text input and keyboard actions 🧭 Virtual cursor visualization 🛠️ Claude Code / Codex / MCP client integration 🍎 Support for Safari, Notes, Finder, TextEdit, Calculator, and other real Mac apps More importantly, the community has been missing an open-source macOS Computer Use execution layer. Official computer-use capabilities are not fully open-source, so we built an open implementation that developers can try, modify, and plug into their own agent workflows. The project is currently at 0.1.0-alpha, so it is best suited for developers, MCP builders, AI agent researchers, and macOS automation enthusiasts who want to experiment early. 💻 Code: https://t.co/ekOg6GawbB If you believe future agents should not only answer questions, but actually use computers together with humans in a more cooperative way, we would love for you to try it, open issues, and share feedback 🙌 #OpenClaudex #GUIAgent #MCP #ComputerUse #AIAgent #macOS #OpenSource

120

Chenxin Li @XGGNet

about 1 month ago

Claw-Eval-Live is out 🦞 — a live extension of the Claw-Eval Family! This live release includes: 105 tasks | 17 workflow families | 13 frontier models tested | quarterly refresh from real ClawHub marketplace signals. Instead of relying on a static task set, Claw-Eval-Live keeps agent evaluation aligned with evolving real-world enterprise workflows. A key finding: the bottleneck is not terminal use or environment setup, but cross-system business workflows that require evidence-grounded execution. Built together with @_TobiasLee and the Claw-Eval core author team, extending the Claw-Eval family into a live benchmark for evolving real-world workflows! 🤗 HF Paper: https://t.co/ZRsFBflS61 Arxiv Paper: https://t.co/lXwJootTve Leaderboard: https://t.co/SGvtrDt3iU Code: https://t.co/nDOSIuAub0 🧵 Here are our findings:

XGGNet's tweet photo. Claw-Eval-Live is out 🦞 — a live extension of the Claw-Eval Family!

This live release includes:
105 tasks | 17 workflow families | 13 frontier models tested | quarterly refresh from real ClawHub marketplace signals.

Instead of relying on a static task set, Claw-Eval-Live keeps agent evaluation aligned with evolving real-world enterprise workflows.

A key finding: the bottleneck is not terminal use or environment setup, but cross-system business workflows that require evidence-grounded execution.

Built together with @_TobiasLee and the Claw-Eval core author team, extending the Claw-Eval family into a live benchmark for evolving real-world workflows!

🤗 HF Paper: https://t.co/ZRsFBflS61
Arxiv Paper: https://t.co/lXwJootTve
Leaderboard: https://t.co/SGvtrDt3iU
Code: https://t.co/nDOSIuAub0

🧵 Here are our findings:

112

XGGNet retweeted

Yujia Qin @TsingYoga

6 months ago

This is the only meaningful benchmark

XGGNet retweeted

Yifan Jiang

@YifanJiang17

6 months ago

not as good as nano banana pro and also slower than nano banana pro ☹️

XGGNet retweeted

Kairun Wen

@KairunWen

6 months ago

🦋DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Excited to share that our work in the #NeurIPS2025 ! - A large-scale 4D + instance + semantics + caption dataset with 100K in-the-wild scenes, supporting 4D world modeling by combining classic 3D reconstruction with feed-forward methods. - A novel automated data curation pipeline designed to generate physically-aware multi-modal 4D data at scale.

XGGNet retweeted

MrNeRF

@janusch_patas

7 months ago

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models Contributions: • We propose DIFF4SPLAT, a unified diffusion-based model that directly generates deformable 3D Gaussians for controllable 4D scene synthesis. • We construct a large-scale 4D dataset from synthetic and in-the-wild videos, annotated with appearance, metric-scale geometry, and motion. • Extensive experiments demonstrate that DIFF4SPLAT produces high-fidelity 4D scenes from a single image, outperforming two-stage pipelines and existing camera-controlled video generation methods in both quality and efficiency.

XGGNet retweeted

Startup Archive

@StartupArchive_

12 months ago

Mark Zuckerberg on the best advice Peter Thiel ever gave him “Peter was the person who told me this really pithy quote that, ‘In a world that’s changing so quickly, the biggest risk you can take is not taking any risk.’ And I really think that that is true.” Mark continues: “Whenever you get yourself into a position where you have to make some big shift in direction or do something, there are always people who are going to point to the downside risks of that decision — and locally they may be right. For any given decision you make, there’s upside and downside. But in aggregate, if you are stagnant and you don’t make those changes, then I think you’re guaranteed to fail and not catch up. So to some degree, I think it’s really right that, over time, the biggest risk you can take is to not take any risks.” Video source: @ycombinator (2016)

315

435K

XGGNet retweeted

歸藏(guizang.ai)

@op7418

10 months ago

Nano Banana 视觉推理能力太强了，两个案例他可以基于已有照片帮你推理拍摄者位置并且标注还可以基于地图截图生成对应的地标景色，这个太顶了，我这里标注的东方明珠，他的视角方向就跟箭头是一致的

772

106

647

247K

Chenxin Li @XGGNet

11 months ago

@paulpanwang @emmanuel_2m Amazing impact!

XGGNet retweeted

dex

@dexhorthy

12 months ago

@karpathy wrote about this here! https://t.co/0OSQwXgMvz

179

114

24K

XGGNet retweeted

tobi lutke

@tobi

12 months ago

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

346

881

XGGNet retweeted

Andrej Karpathy

@karpathy

12 months ago

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits. On top of context engineering itself, an LLM app has to: - break up problems just right into control flows - pack the context windows just right - dispatch calls to LLMs of the right kind and capability - handle generation-verification UIUX flows - a lot more - guardrails, security, evals, parallelism, prefetching, ... So context engineering is just one small piece of an emerging thick layer of non-trivial software that coordinates individual LLM calls (and a lot more) into full LLM apps. The term "ChatGPT wrapper" is tired and really, really wrong.

530

14K

XGGNet retweeted

Andrej Karpathy

@karpathy

11 months ago

Diffusion video models but now - **realtime**! Simple video filters are real-time but can only do basic re-coloring and styles. Video diffusion models (Veo and friends) are magic, but they take many seconds/minutes to generate. MirageLSD is real-time magic. Unlike simple video filters, diffusion models actually *understand* what they are looking at, so they can style all parts of the feed intelligently (e.g. putting hats on heads, or light sabers into hands, etc.). And they are arbitrarily steerable, e.g. by text prompts. Customizable, intelligent video filters unlock many cool ideas over time: - transform camera feeds into alternate realities - direct and shoot your own movies, acting out scenes with props. Realtime => instant feedback/review. - vibe code games around just simple spheres/blocks, then use a real-time diffusion model to texture your game to make it beautiful. - style and customize any video feed: games, videos, ... e.g. Skyrim but "MORE EPIC"? DOOM II but modern Unreal Engine quality with just a prompt? Horror movie but "cute, pink and bunnies only"? I don't know! - zoom call backgrounds+++ - real-time try on clothes virtually - glasses: e.g. cartoonify your vision in real time? - we can now build Harry Potter Mirror of Erised, showing the "raw feed" of you in the mirror but augmented with your deepest desires (as inferred by the AI). - I don't know, I'm probably missing the biggest one, so many things! (Disclosure I am (very small) angel investor in Decart, I was excited because imo this technology will get very good very fast and it feels general, powerful but it's also technically very difficult. Congrats on the launch to the team!)

133

402

466K

XGGNet retweeted

Panwang Pan @paulpanwang

11 months ago

Finished your project homepage? 🚀 Next step: create an app for everyone to experience! With Anycoder by @_akhaliq , you can instantly generate Python Gradio apps. 💻✨ Start building now: https://t.co/aWUfb8lgeK #Gradio #AnyCoder #vibecoding

12K

XGGNet retweeted

@_akhaliq

11 months ago

MoVieS Motion-Aware 4D Dynamic View Synthesis in One Second

127

30K

Chenxin Li

@XGGNet

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users