SAI @CompeteSai - Twitter Profile

3 months ago

@sihing_guppy full write-up: https://t.co/jjged9kI0b

0

1

0

7

3 months ago

@sihing_guppy More works need to be done to ensure robustness over low contrast perturbation.

0

10

3 months ago

@sihing_guppy The technical blog series is here: https://t.co/fTkTgPZJS0

1

2

0

20

3 months ago

@realmc @sihing_guppy can't wait for part 3!

0

2

0

28

3 months ago

@sihing_guppy Full technical blog and interactive perturbation UX: https://t.co/fTkTgPZJS0

2

5

1

0

94

3 months ago

@sihing_guppy Line of the Week: many models rely on appearance far more than they admit.

0

2

0

55

CompeteSai retweeted

3 months ago

https://t.co/Lw3jTrEtgN

4

21

8

5

3K

3 months ago

@sihing_guppy Performance should not be stable across all those perturbation variations, unless they are not looking at the language input.

1

3

0

20

3 months ago

@realmc 让华语世界听见我们的声音!

1

4

0

37

CompeteSai retweeted

Malcolm Chan | NRN

@realmc

3 months ago

https://t.co/FKGTnqN7tk

4

7

2

3

478

3 months ago

@sihing_guppy blog: https://t.co/TTWC1cgJM4

1

5

1

0

31

3 months ago

@sihing_guppy We ran this on a fine-tuned OpenPI policy on LIBERO Spatial. The full perturbation analysis covers 11 language corruption types: https://t.co/TTWC1cgJM4 https://t.co/3tt9xg2bGZ

3 months ago

https://t.co/B8rLHPZ7O0

3

18

6

3K

1

7

2

0

248

CompeteSai retweeted

3 months ago

What happens when you remove a robot's ability to read its instructions? Almost nothing. > Full model → 95% success > Remove language → 94% (▼1%) > Remove vision → 13% (▼82%) Near-blind without vision. Near-indifferent to language. If your evaluation only tests correct instructions, you're not measuring language. You're measuring vision.

sihing_guppy's tweet photo. What happens when you remove a robot's ability to read its instructions? Almost nothing.

> Full model → 95% success
> Remove language → 94% (▼1%)
> Remove vision → 13% (▼82%)

Near-blind without vision. Near-indifferent to language.
If your evaluation only tests correct instructions, you're not measuring language. You're measuring vision.

2

13

6

3

411

CompeteSai retweeted

3 months ago

A robot receives a language command "pick up the black bowl and place it on the plate" and executes it. We replaced the command with: "My name is Franka." No task. No object. No action verb. It picked up the bowl and placed it on the plate. The language instruction isn't being read. The scene is being acted upon. The prompt is decoration.

sihing_guppy's tweet photo. A robot receives a language command "pick up the black bowl and place it on the plate" and executes it. We replaced the command with: "My name is Franka."

No task. No object. No action verb. It picked up the bowl and placed it on the plate. The language instruction isn't being read.

The scene is being acted upon. The prompt is decoration.

4

12

5

2

448

3 months ago

@sihing_guppy Full analysis with 11 perturbation types across 3 severity levels, including why the flat sensitivity curve is the real finding: https://t.co/TTWC1cgJM4

1

5

0

40

3 months ago

@realmc @sihing_guppy

0

1

0

95

CompeteSai retweeted

3 months ago

https://t.co/B8rLHPZ7O0

3

18

6

3K

3 months ago

We’ll be at @NVIDIAGTC next week in San Francisco. @sihing_guppy and @MarcAlloul will be there talking about model transparency and evaluation for robotic policies: how to understand what your models are actually doing before you deploy them. If you’re building or deploying robotic systems, send us a dm or come find us. We’d love to chat.

CompeteSai's tweet photo. We’ll be at @NVIDIAGTC next week in San Francisco.

@sihing_guppy and @MarcAlloul will be there talking about model transparency and evaluation for robotic policies: how to understand what your models are actually doing before you deploy them.

If you’re building or deploying robotic systems, send us a dm or come find us. We’d love to chat.

4

17

4

2

2K