Omnia @omnia_io - Twitter Profile

Pinned Tweet

over 1 year ago

The apps model for extended reality (XR) content is not going to work. No download, everything contextual. Driven by GenUI, enhanced by AI agents and most importantly developed from an open source library you can expand limitlessly. - Omnia plan for 2025 onwards.

0

5

0

2

1K

Omnia

@omnia_io

about 1 month ago

@AdamDraper (:

0

13

Omnia

@omnia_io

about 1 month ago

Most smart glasses demos show someone reading instructions off a screen. That's not what we built. At Comer Industries (Rockford IL), we analyzed 1,025 measurement records across 13 assembly stations. The Settle Bearing step alone was failing 85.7% of the time. One unit needed 12 consecutive reruns on a single step with zero guidance on what to change between attempts. 76% of all failures had a measured value of exactly zero. The machine ran, captured nothing, logged a defect. Not a bad part. Not a broken tool. The worker initiated the cycle before a physical prerequisite was met — because nothing stopped them. Here's what we built: OmniaAgent connects Rokid smart glasses to the legacy OEM software running on the station. The machine completes a press cycle → glasses HUD advances automatically. Worker confirms a step by voice → station software advances in sync. On top of that: a CV model trained on footage from Comer's own equipment, their own parts, their own assembly line. Not a generic model. One that knows what a correctly seated bearing cup looks like on this fixture before this press descends. It checks before the cycle runs. It blocks the invalid cycle before the machine fires. The result: 71.9% of failures are directly addressable. $2.3M in projected annual savings — grounded in their actual production data, not estimates. One station. Path to 100. This is what enterprise AI on the factory floor actually looks like — agents integrated into legacy OEM software, CV models trained on the client's own equipment, error rates as the scoreboard. #SmartGlasses #ManufacturingAI #ComputerVision #EnterpriseAI #AIAgents #Industry40

omnia_io's tweet photo. Most smart glasses demos show someone reading instructions off a screen.

That's not what we built.

At Comer Industries (Rockford IL), we analyzed 1,025 measurement records across 13 assembly stations.

The Settle Bearing step alone was failing 85.7% of the time. One unit needed 12 consecutive reruns on a single step with zero guidance on what to change between attempts.
76% of all failures had a measured value of exactly zero. The machine ran, captured nothing, logged a defect. Not a bad part. Not a broken tool. The worker initiated the cycle before a physical prerequisite was met — because nothing stopped them.

Here's what we built:

OmniaAgent connects Rokid smart glasses to the legacy OEM software running on the station. The machine completes a press cycle → glasses HUD advances automatically. Worker confirms a step by voice → station software advances in sync.

On top of that: a CV model trained on footage from Comer's own equipment, their own parts, their own assembly line. Not a generic model. One that knows what a correctly seated bearing cup looks like on this fixture before this press descends. It checks before the cycle runs. It blocks the invalid cycle before the machine fires.

The result: 71.9% of failures are directly addressable. $2.3M in projected annual savings — grounded in their actual production data, not estimates.

One station. Path to 100.

This is what enterprise AI on the factory floor actually looks like — agents integrated into legacy OEM software, CV models trained on the client's own equipment, error rates as the scoreboard.

#SmartGlasses #ManufacturingAI #ComputerVision #EnterpriseAI #AIAgents #Industry40

0

1

0

115

Omnia

@omnia_io

2 months ago

OmniaClaw on actual glasses. Three demos. 1/ Machine instructions — custom JSON, mock coffee machine. Swap the JSON, swap the machine. Carrier unit, Trane chiller, PTO shaft — same runtime. 2/ Service log — worker speaks, row writes to sheet. No form. No tablet. 3/ Full operator flow on glass — voice, full-screen messages, supervisor path closing in real time. Next layer: custom CV models on your equipment. The JSON tells the worker what to do. The vision model watches if they did it. Custom demos available — manufacturing, HVAC, field service.

0

48

Who to follow

Shinjae Jung

@ShinjaeJung

Building small apps and making cool things with code. Sharing my dev journey here!

SAQIB

@SAQIBm55

Snap Lens Network | Effect House Top Creator

Danny

@danny_marree

Creator of augmented reality experiences and all things digital. More than a billion impressions on our work with Verticar.

Omnia

@omnia_io

2 months ago

What would an openclaw built agent for enterprise look like on smart glasses? Openclaw + nemoclaw. Built on a real catalogue-backed knowledge base + Google Sheet source of truth, with OmniaClaw as a customized OpenClaw UI layer—what you can do today still bumps into documented OpenClaw limits (tool host, local bridge wiring, channel UX). Demo shows supervisor → operator dispatch → voice → sheet anyway.

3

1

0

109

Omnia

@omnia_io

2 months ago

now with captions! yeh

1

0

38

Omnia

@omnia_io

2 months ago

@pascal_bornet Or you deploy smart glasses with the right sensors

0

184

Omnia

@omnia_io

2 months ago

@sir4K_zen Got it. We can share a repo if you’d like and want to experiment running this on smart glasses. We’ll be posting the full voice input flow on the glasses themselves.

0

16

Omnia

@omnia_io

2 months ago

@EvanDataForge Not yet. Agent traceability is something planned and much needed

0

9

Omnia

@omnia_io

2 months ago

If you’re pitching “AI glasses,” ask whether your stack learns a compact predictive model of the scene. V-JEPA 2.1 is a strong citation for why JEPA-style video world models belong in the architecture, notas an optional add-on. https://t.co/m6rytYO0Bh · Meta overview: https://t.co/irqI463Xa0 · Code: https://t.co/dJFeRcDBk6

0

56

Omnia

@omnia_io

2 months ago

V-JEPA 2.1 (Meta FAIR) is a video JEPA that learns dense representations from pixels: a predictive objective in latent space (not “generate every frame”), with deep self-supervision and a unified image+video encoder—so the model sees fine spatial structure and how it evolves over time. For us (smart glasses): that’s the right kind of world model for egocentric wear—anticipation, depth-aware AR, stable perception under head motion, and a path to distilled on-device variants. Thread on what that means in practice ↓

omnia_io's tweet photo. V-JEPA 2.1 (Meta FAIR) is a video JEPA that learns dense representations from pixels: a predictive objective in latent space (not “generate every frame”), with deep self-supervision and a unified image+video encoder—so the model sees fine spatial structure and how it evolves over time.

For us (smart glasses): that’s the right kind of world model for egocentric wear—anticipation, depth-aware AR, stable perception under head motion, and a path to distilled on-device variants. Thread on what that means in practice ↓

10

1

0

85

Omnia

@omnia_io

2 months ago

9 Caveat builders should own: glass-grade latency still implies tethering or hybrid designs for the biggest variants today—but the representation you want on the device is exactly this flavor of predictive, dense, video-native model—not text-token physics.

omnia_io's tweet photo. 9
Caveat builders should own: glass-grade latency still implies tethering or hybrid designs for the biggest variants today—but the representation you want on the device is exactly this flavor of predictive, dense, video-native model—not text-token physics. https://t.co/jzmkMPfy0h

0

38

Omnia

@omnia_io

2 months ago

8 The ViT-G scale shows the ceiling, but distilled ViT-B / ViT-L are the manifesto point: thermal and compute on the face are real. The research program is explicitly edge-tractable if you architect for distillation, not only cloud giants.

omnia_io's tweet photo. 8
The ViT-G scale shows the ceiling, but distilled ViT-B / ViT-L are the manifesto point: thermal and compute on the face are real. The research program is explicitly edge-tractable if you architect for distillation, not only cloud giants. https://t.co/0PRo3Lokgk

0

31

Omnia

@omnia_io

2 months ago

7 Action / dynamics understanding (e.g. on standard video benchmarks) supports hands-busy tasks: cooking, assembly, repair—where the camera shakes and the task is procedural. World model + step-aware UI beats a raw LLM staring at JPEGs.

omnia_io's tweet photo. 7
Action / dynamics understanding (e.g. on standard video benchmarks) supports hands-busy tasks: cooking, assembly, repair—where the camera shakes and the task is procedural. World model + step-aware UI beats a raw LLM staring at JPEGs. https://t.co/kuv0qUog31

0

26

Omnia

@omnia_io

2 months ago

6 Strong depth from video (paper reports large gains on benchmarks like NYUv2) matters for AR occlusion, pinning, and plausibility—reducing the need to ship heavy depth hardware for every SKU (trade-offs remain; metric vs semantic).

omnia_io's tweet photo. 6
Strong depth from video (paper reports large gains on benchmarks like NYUv2) matters for AR occlusion, pinning, and plausibility—reducing the need to ship heavy depth hardware for every SKU (trade-offs remain; metric vs semantic). https://t.co/trfXJPSqX7

0

26

Omnia

@omnia_io

2 months ago

5 On Ego4D, it pushes interaction anticipation—what object the user will engage with next. That’s the unlocked UX layer for GenUI: highlight the right thing before the hand moves, not after.

omnia_io's tweet photo. 5
On Ego4D, it pushes interaction anticipation—what object the user will engage with next. That’s the unlocked UX layer for GenUI: highlight the right thing before the hand moves, not after. https://t.co/TcCpeDjb1b

0

30

Omnia

@omnia_io

2 months ago

4 Unified visual memory (shared encoder for images + video) is huge for glasses: the same representation for a still glance and continuous wear. That’s how you get coherent “where was that / what changed?” behavior without juggling two vision backends.

omnia_io's tweet photo. 4
Unified visual memory (shared encoder for images + video) is huge for glasses: the same representation for a still glance and continuous wear. That’s how you get coherent “where was that / what changed?” behavior without juggling two vision backends. https://t.co/yjmtWElDnY

0

22

Omnia

@omnia_io

2 months ago

3 Deep self-supervision stacks the objective across encoder layers. Translation for product: you get usable signal throughout the stack, which matters when you distill or prune for on-device.

omnia_io's tweet photo. 3
Deep self-supervision stacks the objective across encoder layers. Translation for product: you get usable signal throughout the stack, which matters when you distill or prune for on-device. https://t.co/lXyDlGYQJw

0

30

Omnia

@omnia_io

2 months ago

2 The dense predictive loss pushes the model to predict features for all tokens (masked and visible), not a patchwork where unmasked regions go blurry. For glasses, that means stable, localized perception under head motion—not smeared “video vibes.”

omnia_io's tweet photo. 2
The dense predictive loss pushes the model to predict features for all tokens (masked and visible), not a patchwork where unmasked regions go blurry. For glasses, that means stable, localized perception under head motion—not smeared “video vibes.” https://t.co/RsBVRCcaph

0

26

Omnia

@omnia_io

2 months ago

0

24

Omnia

@omnia_io

2 months ago

1 V-JEPA 2.1 learns dense spatio-temporal video representations: not just “what’s happening” globally, but local structure—boundaries, depth cues, fine detail— grounded in where things are in space and time. That’s the right object for egocentric wearables.

1

0

45

Omnia

@omnia_io

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users