VLM Run @vlmrun - Twitter Profile

Pinned Tweet

VLM Run

@vlmrun

2 months ago

Chat with Orion – the first visual agent that sees, reasons, and acts across images, videos, and documents.

1

2K

149

535

13M

VLM Run

@vlmrun

5 days ago

@deeperflows Yes! CLIs/TUIs, stdout and pipe support is all we need for intelligence. 😀

0

27

VLM Run

@vlmrun

9 days ago

Introducing mm-ctx: A fast, multimodal context manager for your agents.

22

2K

171

765

8M

VLM Run

@vlmrun

5 days ago

@MasterMuskan22 https://t.co/jzmTr1AKQf

0

9

Who to follow

jeff dean blessed these hands

Aman

@_amankishore

Building | prev @harvey__ai🧑‍⚖️, @mirage_ml 🥏, @nvidia 🚗, @apple 🤖

VLM Run

@vlmrun

5 days ago

@_cmd8 No, we haven’t open sourced on GH yet, but you can download it over pypi. We just have a prebuilt wheel since we also use rust under the hood. https://t.co/jzmTr1AKQf

0

27

VLM Run

@vlmrun

5 days ago

@PatronusBen Not yet, but what do you mean here? Guardrails?

0

9

VLM Run

@vlmrun

5 days ago

@jeremyparkphd 🔥🔥🔥

0

2

1

0

19

VLM Run

@vlmrun

5 days ago

@JustJerry121 @jeremyparkphd Check out the docs: https://t.co/jzmTr1AKQf

0

2

0

20

VLM Run

@vlmrun

5 days ago

@JustJerry121 @jeremyparkphd We’re building mm-ctx for richer multi-modal context management for LLMs/VLMs. There’s a ton of open-source projects around context management, but they fail miserably when you’re looking to build multimodal agents.

0

2

0

46

VLM Run

@vlmrun

6 days ago

If you're building visual agents and you're at #CVPR2026 this year, come find us. Get a sneak peek at the next generation of @vlmrun's visual agent Orion. We'll be announcing it and demoing live for the first time. If you're in Denver, stop by the meetup Thursday night. Details and signup in the original post from @jeremyparkphd.

Jeremy Park, PhD

@jeremyparkphd

6 days ago

@vlmrun is hosting a Visual Agents Meetup in Denver during CVPR! We've been building visual coding agents and releasing open-source tools like mm-ctx (multimodal context for agents), and are looking to connect with others working in this space. Stop by Thursday night to share what you're working on and see our new visual agent in action. If you're attending CVPR, feel free to reach out! Happy to connect during the week. Please share this with colleagues who will be at CVPR! 📅 Thursday, June 4th 🕕 6-8 PM MDT 📍 Downtown Denver Hosted by: @vlmrun, @dineshredy, @jeremyparkphd Signup link in the comments 👇

3

6

3

1

2K

1

88

27

2

2K

vlmrun retweeted

Jeremy Park, PhD

@jeremyparkphd

6 days ago

@vlmrun is hosting a Visual Agents Meetup in Denver during CVPR! We've been building visual coding agents and releasing open-source tools like mm-ctx (multimodal context for agents), and are looking to connect with others working in this space. Stop by Thursday night to share what you're working on and see our new visual agent in action. If you're attending CVPR, feel free to reach out! Happy to connect during the week. Please share this with colleagues who will be at CVPR! 📅 Thursday, June 4th 🕕 6-8 PM MDT 📍 Downtown Denver Hosted by: @vlmrun, @dineshredy, @jeremyparkphd Signup link in the comments 👇

3

6

3

1

2K

vlmrun retweeted

VLM Run

@vlmrun

9 days ago

Introducing mm-ctx: A fast, multimodal context manager for your agents.

22

2K

171

765

8M

VLM Run

@vlmrun

8 days ago

@ravixpanchal Absolutely!

0

4

0

1K

VLM Run

@vlmrun

24 days ago

💬 Discord: https://t.co/DJ0jYXmTx5 📦 PyPI: https://t.co/8lgVJNJYfZ 🔧 mm-cli-skills: https://t.co/X7EGDKdQrX

0

5

1

0

163

VLM Run

@vlmrun

24 days ago

Excited to share that mm-ctx is now live on @huggingface Spaces! Try it in the browser via an interactive terminal without installing anything: https://t.co/UgRLJDPHB8 mm-ctx – fast, multimodal context for agents. LLM-based agents handle text fine, but as soon as a directory contains images, videos, or PDFs with visual content, they struggle to understand the full context. mm-ctx is meant to feel familiar: the Unix tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI. - mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches - mm cat <document>.pdf returns a metadata description of the file - mm cat <photo>.jpg returns a caption of the photo - mm cat <video>.mp4 returns a caption of the video A few things we obsessed over: ⚡ Speed: Rust core for the hot paths 🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V). 🔗 Composable: stdin + structured outputs 🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw. We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.

vlmrun's tweet photo. Excited to share that mm-ctx is now live on @huggingface Spaces! Try it in the browser via an interactive terminal without installing anything: https://t.co/UgRLJDPHB8

mm-ctx – fast, multimodal context for agents.

LLM-based agents handle text fine, but as soon as a directory contains images, videos, or PDFs with visual content, they struggle to understand the full context.

mm-ctx is meant to feel familiar: the Unix tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI.
- mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches
- mm cat <document>.pdf returns a metadata description of the file
- mm cat <photo>.jpg returns a caption of the photo
- mm cat <video>.mp4 returns a caption of the video

A few things we obsessed over:
⚡ Speed: Rust core for the hot paths
🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V).
🔗 Composable: stdin + structured outputs
🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw.

We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.

4

15

7

4

1K

VLM Run

@vlmrun

24 days ago

🤗 Hugging Face Spaces: https://t.co/HUCXTtM7s8 Interact with mm live on @huggingface spaces.

2

7

1

2

241

vlmrun retweeted

Jeremy Park, PhD

@jeremyparkphd

about 1 month ago

I made a rock climbing tool using computer vision! I prompted @vlmrun's visual agent Orion to segment all of the blue bouldering holds, and it did a good job! It is interesting that now we can prompt VLMs to segment all of the holds, rather than creating a new dataset from scratch to train a model. With holds detection + pose estimation, I can show how each hold gets activated as a hand or foot uses it. Once we touch the final hold with both hands, the route is completed, and I show the overall path of my torso midpoint. A tool like this could help climbers understand their movement better. I’m still very much a beginner at bouldering, so I could use all the help I can get 🤣 There are definitely things to improve, but overall I’m encouraged by this first demo 🙂 Let me know what you think in the comments! Models used: - @vlmrun's Orion for segmentation - ViTPose+ Huge for pose estimation (via @huggingface 🤗) - RT-DETR for person detection (via @huggingface 🤗) Shoutout to Daniel Reiff and his bouldering + computer vision project for the inspiration!

1

7

5

1

436

VLM Run

@vlmrun

about 2 months ago

Colab notebook: https://t.co/FJA7GEKPpv

0

1

0

149

VLM Run

@vlmrun

about 2 months ago

Manually parsing handwritten intake forms can be slow and prone to error, while VLM Run's HIPAA-ready API allows you to extract the same details in seconds. In this tutorial by @jeremyparkphd, learn how to use VLM Run to extract structured JSON from handwritten healthcare documents at scale. Through this walkthrough, you will learn how to: - Upload documents in the Requests tab and run them against your saved skills - Enable confidence scores and grounding to see exactly where each field came from in the original document - Edit incorrect extractions and provide feedback to improve extraction over time - Run the same workflow programmatically via the VLM Run API as shown in Google Colab

2

7

4

2

611

VLM Run

@vlmrun

about 2 months ago

Try chat today: https://t.co/9aF7eHyg7H

0

2

1

0

127

VLM Run

@vlmrun

about 2 months ago

Read the full whitepaper here: https://t.co/I39yzpsNUQ

0

3

0

1

2K

VLM Run

@vlmrun

2 months ago

Chat with Orion – the first visual agent that sees, reasons, and acts across images, videos, and documents.

1

2K

149

535

13M

VLM Run

@vlmrun

2 months ago

Read our skills documentation here: https://t.co/LlfHas2MUn Chat: https://t.co/9aF7eHyg7H

0

3

1

0

207

VLM Run

@vlmrun

2 months ago

Announcing Orion Skills! 🚀 Rather than rewriting prompts every time you want to define a specific task, you can now package all of that knowledge into a reusable skill. Why skills? - Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent) - Versionable: Pin a specific skill version for reproducible results, or use "latest" to always get the newest revision - Composable: Pass multiple skills in a single request, or combine them with custom schemas Unlike purely text-based skills, we have reimagined what skills mean for visual agents and how to codify visual workflows into skills. Try skills in chat today! And check out this skills creation tutorial by @jeremyparkphd 👇

1

9

5

1

490

VLM Run

@vlmrun

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users