fig @figbrains - Twitter Profile

about 22 hours ago

Dataset here: https://t.co/RFdGroflPO This work is a small iteration building on tremendous research from folks like @xwang_lk, @TianbaoX, @PangWeiKoh, @lateinteraction, @ZhiruoW, @Adamlu28 and many others. What an exciting time to be working in this field!

0

5

2

1

142

figbrains retweeted

about 22 hours ago

GUI-Perturbed is open source. Use it to evaluate your own models. Pipeline: https://t.co/b6DBxRG1My Technical Report: https://t.co/ge5sKrFUE4 Results Viewer: https://t.co/AbKIP3rgPl

1

5

1

0

85

figbrains retweeted

about 22 hours ago

The three (Qwen2.5-VL-7B, UI-TARS-1.5-7B, GTA1-7B) share a base checkpoint but differ in post-training. so any gap in robustness comes from the training recipe, not the architecture. Which model do you think would perform the best?

1

5

2

1

138

figbrains retweeted

about 22 hours ago

The team @figbrains, along with our friends @manifoldrg, took three of the best computer-use models and, surprisingly, broke all of them with very simple perturbations like changing zoom or colors. Read on to understand our research, including a new SoTA Evaluation Dataset for Browser-use models + a new kind of interactive data sandbox!

3

16

5

1

2K

figbrains retweeted

5 days ago

At @figbrains, we’re testing frontier models (Fable, Kimi, etc) on simple web tasks that should be solvable. They failed in ways that wouldn't stump a human (we think) Results coming in a few days, but we want to see how good humans are: Which change causes the most failures?

2

9

5

2

263

figbrains retweeted

7 days ago

GUI-DR confirms an intuition we at @figbrains have had for a while: today’s computer control models often overfit to specific interfaces rather than learning the underlying task. Systematic GUI perturbations significantly reduce model performance. Read more below!

0

2

1

0

161

figbrains retweeted

Joe Habibi Hakim

@joehabibii

7 days ago

It was great working with @figbrains on GUI-DR! We applied domain randomization from robotics to vary visual scenes and instructions, exposing fragile model behaviors like confusing the browser search bar with the formula bar in Google Sheets.

0

4

3

0

184

figbrains retweeted

7 days ago

The Software Control research team at Manifold has been working on advancing new frontiers in long horizon computer control & grounding with @figbrains Check out some of our early research below, with more to come soon!

0

3

1

0

59

7 days ago

GUI-DR applies domain randomization from robotics, varying visual scenes and instructions along controlled axes to expose fragile model behaviors.

7 days ago

Computer Control models can score 90%+ on standard benchmarks, but will fail when you set page zoom to 70%. We're built GUI-DR, an OS pipeline that can restyle, reposition, and remove DOM elements on real webpages to reveal model weaknesses that fixed-scene benchmarks miss.

1

6

2

729

0

1

0

1

23

18 days ago

Fig wants to directly support researchers working on foundationally new takes on frontier models - targeting hard problems like long horizon multi-environent action. Reach out to contact @ fig . inc if you're working on these or related areas.

18 days ago

This week at #CVPR2026 we presented MultiNet v1.0 at the MMFM workshop. It is a benchmark built around a question most evaluations skip: what happens to a multimodal model when you take it out of the one domain it was trained for and ask it to handle everything at once?

2

6

2

0

1K

0

2

1

0

136

figbrains retweeted

19 days ago

Loved @pliang279’s #CVPR2026 talk on AI modalities beyond vision/language: touch, smell, etc. The vision-tactile retrieval work reinforces that good representations make hard-to-observe signals queryable. We’re applying a similar lens to trajectories at @figbrains. More soon!

0

7

4

0

224

figbrains retweeted

20 days ago

We built MultiNet v1.0 to test how well frontier models generalize across domains from text to robotics to gameplay and found surprising patterns of failure. We're presenting at the #CVPR2026 MMFM workshop @ 3PM, room Four Seasons 4. Come hear where & how they break!

0

8

5

0

455

20 days ago

Come meet the Fig team @ CVPR this week, today through Friday!

21 days ago

Headed to #CVPR2026! I'll be there on behalf of @figbrains and @ManifoldRG, presenting our research on next-generation multimodal models and evaluation systems. If you're into multimodal models, VLAs, or how we actually evaluate them, come say hi - I'd love to talk!

0

10

4

0

774

0

130

figbrains retweeted

20 days ago

I’ll be at CVPR in Denver, along w/ some brilliant colleagues 🚀 If you’re around anytime over the next few days and interested in computer control or long horizon robotics, please reach out - the @figbrains team is around! We’d love to give a sneak peek at what we’re building.

0

6

3

1

430

figbrains retweeted

21 days ago

Headed to #CVPR2026! I'll be there on behalf of @figbrains and @ManifoldRG, presenting our research on next-generation multimodal models and evaluation systems. If you're into multimodal models, VLAs, or how we actually evaluate them, come say hi - I'd love to talk!

0

10

4

0

774

figbrains retweeted

7 months ago

Our next Frontiers Talk is on Tuesday, Dec 2 at 12 PM PDT. @pranavguru13, Founding Research Engineer at @figbrains and lead for MultiNet at Manifold, will walk through how to build the next generation of multimodal benchmarks for functional intelligence. Register below!

ManifoldRG's tweet photo. Our next Frontiers Talk is on Tuesday, Dec 2 at 12 PM PDT.

@pranavguru13, Founding Research Engineer at @figbrains and lead for MultiNet at Manifold, will walk through how to build the next generation of multimodal benchmarks for functional intelligence.

Register below! https://t.co/Q0Na47Ck9c

1

4

2

0

1K

7 months ago

Get a sneak peek at our work on how to evaluate the next generation of multimodal models!

7 months ago

Our next Frontiers Talk is on Tuesday, Dec 2 at 12 PM PDT. @pranavguru13, Founding Research Engineer at @figbrains and lead for MultiNet at Manifold, will walk through how to build the next generation of multimodal benchmarks for functional intelligence. Register below!

1

4

2

0

1K

0

1

0

246

figbrains retweeted

7 months ago

Our next Frontiers Talk is this Friday, Nov 21 at 12 PM PDT! @pranavguru13, Founding Research Engineer @figbrains and Research Lead for MultiNet at Manifold, will share how to build the next generation of multimodal benchmarks for functional intelligence. Register below!

ManifoldRG's tweet photo. Our next Frontiers Talk is this Friday, Nov 21 at 12 PM PDT!

@pranavguru13, Founding Research Engineer @figbrains and Research Lead for MultiNet at Manifold, will share how to build the next generation of multimodal benchmarks for functional intelligence.

Register below! https://t.co/oEmrZ5uPIR

1

5

3

0

286

figbrains retweeted

8 months ago

Thrilled to share MultiNet v1.0 with the research community - a collaboration between research groups @figbrains, Manifold Research, @GeorgiaTech, and @MIT. This benchmark reveals critical limitations in how current AI systems generalize across domains. 🧵

1

5

4

0

403

8 months ago

We're grateful to work w/ research teams @ManifoldRG @GeorgiaTech and @MIT Explore the benchmark: https://t.co/WMl4At9rUL and let us know what you think!

0

2

0

59

8 months ago

We're excited to announce MultiNet v1.0 - the first cross-domain benchmark for multimodal AI systems. Unlike existing evaluations that test models within single domains, MultiNet reveals what happens when AI systems encounter the full complexity of real-world tasks.

1

9

3

0

1K