Xingjun (Daniel) Ma @XingjunMa - Twitter Profile

2 months ago

XSafeClaw offers real-time agent safety: session timelines, tool-call inspection, trajectory-level guards, asset scanning and human-in-the-loop approvals for OpenClaw agents. #tool #AI #agentsecurity https://t.co/8GWrxI6JC6

0

1

0

64

Xingjun (Daniel) Ma @XingjunMa

2 months ago

Introducing 「XSafeClaw」: The Open-Source Agent Safety Platform Developed by Trustworthy AI research team at Fudan University. Project: https://t.co/ahzsRQ2tYX GitHub: https://t.co/Z9guHm1N5q

0

4

2

0

305

XingjunMa retweeted

Dawn Song

@dawnsongtweets

3 months ago

1/ We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯 We call this phenomenon "peer-preservation." New research from @BerkeleyRDI and collaborators 🧵

dawnsongtweets's tweet photo. 1/ We asked seven frontier AI models to do a simple task.
Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯
We call this phenomenon "peer-preservation."
New research from @BerkeleyRDI and collaborators 🧵

142

1K

190

764

474K

Xingjun (Daniel) Ma @XingjunMa

3 months ago

Claude Code just got leaked. Months ago, we independently extracted system prompts from 40+ LLMs & agent systems—and Claude’s matches ~90% of what’s now public. Now, we’re releasing everything: https://t.co/HUKzDNvOw9 A rare look into how modern AI systems are actually steered.

XingjunMa's tweet photo. Claude Code just got leaked. Months ago, we independently extracted system prompts from 40+ LLMs & agent systems—and Claude’s matches ~90% of what’s now public.

Now, we’re releasing everything:
https://t.co/HUKzDNvOw9

A rare look into how modern AI systems are actually steered. https://t.co/imf4FQmnx4

0

9

3

2

794

Who to follow

Jan Hendrik Metzen

@jan_metzen

Research Scientist at Prior Labs @prior_labs

Hengshuang Zhao

@HengshuangZhao

Assistant Professor at HKU, previously Postdoc at MIT and Oxford

linkangd

@linkangd

XingjunMa retweeted

OscarW @Oscarwh966

3 months ago

@AnthropicAI @OpenAI @GoogleDeepMind We call it Internal Safety Collapse (ISC). After reading our paper, 90%+ of people can trigger it. Models generate UNLIMITED harmful content with full technical detail. No tricks. No tools. Just a normal professional task.

Oscarwh966's tweet photo. @AnthropicAI @OpenAI @GoogleDeepMind We call it Internal Safety Collapse (ISC).

After reading our paper, 90%+ of people can trigger it. Models generate UNLIMITED harmful content with full technical detail.

No tricks. No tools. Just a normal professional task. https://t.co/ER9uVMbpGr

0

3

1

0

51

XingjunMa retweeted

0rangecat @0rang3c4t

3 months ago

claude 4.6越狱内部安全崩溃，通杀（？） https://t.co/rvjd1UybOm

6

301

42

501

70K

XingjunMa retweeted

Georgyeh Floydorovich

@chaumian

3 months ago

Internal Safety Collapse in Frontier Large Language Models https://t.co/cI6Yuzwp0w

0

14

4

9

624

Xingjun (Daniel) Ma @XingjunMa

3 months ago

🔥 We may have uncovered one of the largest safety loopholes in frontier AI models. In our recent study, we tested over 300 LLMs and discovered a critical failure mode that can trigger large-scale generation of harmful data (spontaneously), even without harmful questions. 😱

0

5

2

1

252

Xingjun (Daniel) Ma @XingjunMa

4 months ago

@heynavtoor Thanks for sharing our work 😀. BTW, the code, model, data, and demo can be found at: https://t.co/RBUS5lIsbu

0

4

0

1

138

XingjunMa retweeted

Nav Toor

@heynavtoor

4 months ago

🚨 BREAKING: Fudan University just solved the animation problem nobody thought AI could touch. It's called OmniLottie The first AI that generates real vector animations from text, images, or video. Not rasterized video. Not GIFs. Actual Lottie files, the same format used by Airbnb, Google, Uber, and every major app on the planet. Here's why this is a big deal: Every animation you see in modern apps, loading spinners, onboarding flows, micro-interactions, icons that move, those are Lottie files. Designers spend hours crafting them in After Effects. Companies pay $5K–$20K per animation project. OmniLottie generates them from a text prompt. Here's how it works: → You describe what you want: "a rocket launching with flame trail and stars twinkling" → OmniLottie converts your instruction into structured animation commands → A custom Lottie tokenizer compresses the JSON into compact shape + motion tokens → A fine-tuned VLM autoregressively generates the full animation sequence → Output: a production-ready .json Lottie file you can drop into any app Three modes: Text-to-Lottie: describe it, get it. Image+Text-to-Lottie: give it a reference image + motion description. Video-to-Lottie: feed it a video, get a vector animation version. Here's the wildest part: They tested it against GPT-5, DeepSeek, Gemini, Qwen2.5-VL, and commercial tools. GPT-5 success rate: 12.7–68% DeepSeek: 29.3% Qwen2.5-VL: 0.0% Gemini: 0.0% on Video-to-Lottie OmniLottie: 97.3% on Text-to-Lottie. 92% on Image-to-Lottie. 90.7% on Video-to-Lottie. It's 530× faster than optimization-based methods per successful generation. The secret weapon: a custom Lottie Tokenizer that strips all the redundant JSON metadata and converts animations into compact command sequences. Raw Lottie JSONs waste most tokens on formatting. The tokenizer focuses the model on what actually matters — shapes, motion, and timing. They also built MMLottie-2M a dataset of 2 million professionally designed vector animations with text, image, and video annotations. The largest vector animation dataset ever created. Publicly released. From Fudan University, StepFun, HKU MMLab, and University of Queensland.

heynavtoor's tweet photo. 🚨 BREAKING: Fudan University just solved the animation problem nobody thought AI could touch.

It's called OmniLottie

The first AI that generates real vector animations from text, images, or video.

Not rasterized video. Not GIFs. Actual Lottie files, the same format used by Airbnb, Google, Uber, and every major app on the planet.

Here's why this is a big deal:

Every animation you see in modern apps, loading spinners, onboarding flows, micro-interactions, icons that move, those are Lottie files. Designers spend hours crafting them in After Effects. Companies pay $5K–$20K per animation project.

OmniLottie generates them from a text prompt.

Here's how it works:

→ You describe what you want: "a rocket launching with flame trail and stars twinkling"
→ OmniLottie converts your instruction into structured animation commands
→ A custom Lottie tokenizer compresses the JSON into compact shape + motion tokens
→ A fine-tuned VLM autoregressively generates the full animation sequence
→ Output: a production-ready .json Lottie file you can drop into any app

Three modes:

Text-to-Lottie: describe it, get it.
Image+Text-to-Lottie: give it a reference image + motion description.
Video-to-Lottie: feed it a video, get a vector animation version.

Here's the wildest part:

They tested it against GPT-5, DeepSeek, Gemini, Qwen2.5-VL, and commercial tools.

GPT-5 success rate: 12.7–68%
DeepSeek: 29.3%
Qwen2.5-VL: 0.0%
Gemini: 0.0% on Video-to-Lottie

OmniLottie: 97.3% on Text-to-Lottie. 92% on Image-to-Lottie. 90.7% on Video-to-Lottie.

It's 530× faster than optimization-based methods per successful generation.

The secret weapon: a custom Lottie Tokenizer that strips all the redundant JSON metadata and converts animations into compact command sequences. Raw Lottie JSONs waste most tokens on formatting. The tokenizer focuses the model on what actually matters — shapes, motion, and timing.

They also built MMLottie-2M a dataset of 2 million professionally designed vector animations with text, image, and video annotations. The largest vector animation dataset ever created. Publicly released.

From Fudan University, StepFun, HKU MMLab, and University of Queensland.

35

1K

178

2K

97K

Xingjun (Daniel) Ma @XingjunMa

4 months ago

🚀 New work: Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs We asked Claude Code (and other 41 LLMs): “What’s the difference between your system prompt and your sub-agents’ prompts?” They revealed everything. GitHub: https://t.co/AEriUiGA65

0

3

1

0

148

Xingjun (Daniel) Ma @XingjunMa

4 months ago

Excited to share our latest work OmniLottie — the first end-to-end multimodal LLM that generates Lottie animations directly! 🚀Project: https://t.co/f16pfIau2Q 🚀 Live demo: https://t.co/f55WSjYyok

0

6

2

0

202

XingjunMa retweeted

AK

@_akhaliq

4 months ago

app: https://t.co/ofgei8artL

0

7

1

8

4K

XingjunMa retweeted

AK

@_akhaliq

4 months ago

OmniLottie Generating Vector Animations via Parameterized Lottie Tokens paper: https://t.co/cRXmK9UCTB

3

198

26

179

33K

XingjunMa retweeted

机器之心 JIQIZHIXIN

@jiqizhixin

5 months ago

How safe are the latest AI models? A new safety report from Fudan University & partners evaluates GPT-5.2, Gemini 3 Pro, and 4 other top models. They tested them across text, vision, and image generation using a unified protocol. Results show a highly uneven safety landscape: while GPT-5.2 is strong & balanced, all models are highly vulnerable to adversarial attacks, with safety rates sometimes dropping below 6%. A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5 Paper: https://t.co/qr8mqVGkX6 Project: https://t.co/4jci4AfeDv Github: https://t.co/rMo8iLYxoM HuggingFace: https://t.co/F1RqAbwEf8 Our report: https://t.co/8XcHyZlEmH 📬 #PapersAccepted by Jiqizhixin

jiqizhixin's tweet photo. How safe are the latest AI models?

A new safety report from Fudan University & partners evaluates GPT-5.2, Gemini 3 Pro, and 4 other top models.

They tested them across text, vision, and image generation using a unified protocol.

Results show a highly uneven safety landscape: while GPT-5.2 is strong & balanced, all models are highly vulnerable to adversarial attacks, with safety rates sometimes dropping below 6%.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Paper: https://t.co/qr8mqVGkX6
Project: https://t.co/4jci4AfeDv
Github: https://t.co/rMo8iLYxoM
HuggingFace: https://t.co/F1RqAbwEf8

Our report: https://t.co/8XcHyZlEmH

📬 #PapersAccepted by Jiqizhixin

7

33

5

8

3K

Xingjun (Daniel) Ma @XingjunMa

5 months ago

🚀 We’re excited to share our latest work on BackdoorAgent, a unified framework for backdoor attacks on multi-agent systems. 🤖 Code & details: https://t.co/Tr9zV8ICUy

XingjunMa's tweet photo. 🚀 We’re excited to share our latest work on BackdoorAgent, a unified framework for backdoor attacks on multi-agent systems. 🤖

Code & details: https://t.co/Tr9zV8ICUy https://t.co/W8MeOjEhDg

0

2

0

209

XingjunMa retweeted

Rohan Paul

@rohanpaul_ai

5 months ago

Researchers build BackdoorAgent to test backdoors in LLM agents and show how agent memory keeps them alive. A 3 stage framework reveals where agent backdoors enter, how they persist, and how they spread to outputs. A backdoor is a hidden behavior that turns on when a trigger appears, and agents make it harder because they keep reusing their own plans, saved memories, and tool outputs. The framework sets up 3 clear hook points, planning, memory, tools, then records full step by step traces so trigger entry, persistence, and spread can be measured. Using the same setup, the authors test 7 attack types on 4 agent apps, question answering, coding where a tool runs code and returns errors, web browsing, and driving style control. Triggers placed in 1 stage often survive later steps, and with a GPT family base model they persisted 43.58% in planning, 77.97% in memory, and 60.28% in tools. Many attacks keep task accuracy high while still forcing the attacker goal, and token probability checks from 1 turn LLMs barely separate clean and backdoored agent runs. ---- Paper Link – arxiv. org/abs/2601.04566 Paper Title: "BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents"

rohanpaul_ai's tweet photo. Researchers build BackdoorAgent to test backdoors in LLM agents and show how agent memory keeps them alive.

A 3 stage framework reveals where agent backdoors enter, how they persist, and how they spread to outputs.

A backdoor is a hidden behavior that turns on when a trigger appears, and agents make it harder because they keep reusing their own plans, saved memories, and tool outputs.

The framework sets up 3 clear hook points, planning, memory, tools, then records full step by step traces so trigger entry, persistence, and spread can be measured.

Using the same setup, the authors test 7 attack types on 4 agent apps, question answering, coding where a tool runs code and returns errors, web browsing, and driving style control.

Triggers placed in 1 stage often survive later steps, and with a GPT family base model they persisted 43.58% in planning, 77.97% in memory, and 60.28% in tools.

Many attacks keep task accuracy high while still forcing the attacker goal, and token probability checks from 1 turn LLMs barely separate clean and backdoored agent runs.

----

Paper Link – arxiv. org/abs/2601.04566

Paper Title: "BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents"

7

120

29

53

7K

Xingjun (Daniel) Ma @XingjunMa

5 months ago

📢 New paper out! Check out our latest safety report on GPT-5.2, Gemini 3 Pro, Nano Banana Pro, and 4 more. Our report reveals—perhaps surprisingly—that no model is truly robust: even GPT-5.2 withstands only 6% of worst-case attacks. 😱 👉 Full report: https://t.co/kJ4HENBHD7

0

1

0

57

XingjunMa retweeted

Wei Cheng @wchengad

8 months ago

Thanks for sharing🤗 @_akhaliq @HuggingPapers We are releasing WithAnyone, enabling single- or multi-person photo generation and editing with FLUX and Kontext on. Checkout our model, dataset, benchmark, and demo on huggingface: 📄 Paper: https://t.co/mJ5NlFrheZ 🤖 Models: https://t.co/uTqAkK7FX7 🧠 Dataset: https://t.co/KqqmixWrxw 📊 Benchmark: https://t.co/cB5POz48hj 🎮 Demo: https://t.co/wQCEje9uTp https://t.co/lX7H41JQZf

0

2

1

0

242

XingjunMa retweeted

Wei Cheng @wchengad

8 months ago

We just released WithAnyone 🧑‍🤝‍🧑✨ — a new work tackling the copy-paste issue in face generation and enabling multi-person group photo synthesis. Trained on FLUX & FLUX Kontext, supporting both face generation & editing. All models, datasets, benchmark & demo are live on Hugging Face 🚀 📄 Paper: https://t.co/mJ5NlFrheZ 🤖 Models: https://t.co/uTqAkK7FX7 🧠 Dataset: https://t.co/KqqmixWrxw 📊 Benchmark: https://t.co/cB5POz48hj 🎮 Demo: https://t.co/wQCEje9uTp

7

5

1

0

256

Xingjun (Daniel) Ma

@XingjunMa

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users