Imran Khan

@ImranKBangash

Imran is a computer vision and 3D AI entusiast with PhD in Embedded Vision Systems.

Sweden

Joined April 2010

151 Following

71 Followers

486 Posts

ImranKBangash retweeted

11 days ago

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on https://t.co/6T0R9P7EXS • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: https://t.co/vurk7GCQUs

37

1K

118

328

100K

ImranKBangash retweeted

Aradhye ✈️ ICML'26

@AradhyeAgarwal

13 days ago

Mandatory repost for the CVPR audience! Cool thing is this is just predicting the depth maps per frame. There is no temporal smoothening whatsoever.

1

114

7

88

17K

ImranKBangash retweeted

Chuhan Zhang @ChuhanZhang5

16 days ago

Huge congrats to the team, D4RT is a team work and all the authors have been working very hard on this in the past one year. Very well deserved. 🍻 and thank you Award Committee Members for the recognition.

20

195

18

18

33K

ImranKBangash retweeted

16 days ago

3D scene reconstructions by NVIDIA. ArtiFixer - repairs artifacts and extends sparse views via Wan 2.1. - high-fidelity inpainting in occluded regions - gens hundreds of consistent frames in a single pass - 3D Gaussian Splatting for navigable scene reconstruction Makes the 3D environment look photorealistic and fully navigable for VR/AR. It basically turns a broken 3D model into a polished, professional scene. https://t.co/weOQcfXleO

1

289

40

283

16K

Who to follow

Verified account

A developer who builds, reads deeply, experiments with tech, and seeks clarity | Previous startup: @3ducks_studio | building https://t.co/3HadHeow4g

Growth Coach | DM for coaching

Laura Bongers 📖

@mindoverchaos33

📚 Writer and Reader 📚 | Sharing thoughts on life, goals, self-improvement & creativity || 🌦🍂🍁

ImranKBangash retweeted

18 days ago

Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop. It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license. This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵

Google's tweet photo. Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.

It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.

This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵

249

9K

1K

3K

880K

ImranKBangash retweeted

23 days ago

OpenJarvis: a local-first personal AI is now available to run with Ollama Built by Stanford’s @HazyResearch and Scaling Intelligence labs, as part of their “Intelligence Per Watt” research into efficient local AI. @Stanford Learn more in the blog post 👇👇👇

ollama's tweet photo. OpenJarvis: a local-first personal AI is now available to run with Ollama

Built by Stanford’s @HazyResearch and Scaling Intelligence labs, as part of their “Intelligence Per Watt” research into efficient local AI. @Stanford

Learn more in the blog post 👇👇👇 https://t.co/qINCXwxn3q

62

3K

370

3K

203K

ImranKBangash retweeted

divyansh tiwari

@DivyanshT91162

23 days ago

LLM Wiki v0.4.16 just made knowledge graphs feel insanely fast.🤯 Huge rendering upgrades mean you can now explore massive AI knowledge maps without the lag, freezes, or clutter. Search flows smoother. Navigation feels instant. This is starting to look less like a wiki… and more like a second brain. Repo👇

2

99

17

114

7K

ImranKBangash retweeted

23 days ago

Feed-forward 3D reconstruction methods typically predict pointmaps in camera-centric frames. But why should a camera's arbitrary orientation define the coordinate system? We introduce G3T, a transformer that predicts pointmaps in gravity-aligned frames. Regardless of input image orientation, our method always produces upright pointmaps (see demo). We leverage this uprightness to create G3T-Long, a submap-based reconstruction method that improves robustness on long-sequence 3D reconstruction (more on that below). Interactive demos, code, and model weights are available on our project page.

7

308

52

192

24K

ImranKBangash retweeted

24 days ago

This #CVPR2026 paper from our research team is trending #1 on @HuggingFace 🤗 Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, “seeing” is only useful if a model can pinpoint where something is fast enough to act. Trained on 138M high-quality samples, LocateAnything decodes bounding boxes in parallel instead of one coordinate at a time, improving localization accuracy while dramatically increasing throughput for visual grounding and detection. Project page: https://t.co/O7JMe8tzFM

56

2K

333

2K

328K

ImranKBangash retweeted

28 days ago

🚨 NotebookLM + Google Antigravity might be the most underrated AI combo right now. Almost no one is using it… But the people who are? They’re getting a massive edge. This setup can help you: → Learn faster → Research smarter → Turn ideas into polished content in minutes And it takes less than 2 minutes to set up. Here’s exactly how to use it + what it can do 👇🧵

Heykazitarek's tweet photo. 🚨 NotebookLM + Google Antigravity might be the most underrated AI combo right now.

Almost no one is using it…
But the people who are? They’re getting a massive edge.
This setup can help you:
→ Learn faster
→ Research smarter
→ Turn ideas into polished content in minutes

And it takes less than 2 minutes to set up.
Here’s exactly how to use it + what it can do 👇🧵

45

205

48

181

11K

ImranKBangash retweeted

about 1 month ago

AI AGENT STACK — MASTER TREE 🌲 AI Agents │ ├── 01. Foundation Layer │ ├── LLMs │ │ ├── GPT-4.1 │ │ ├── Claude │ │ ├── Gemini │ │ └── DeepSeek │ │ │ ├── Prompting │ │ ├── System Prompts │ │ ├── Few-shot │ │ ├── Chain of Thought │ │ └── Structured Output │ │ │ └── Context │ ├── Memory │ ├── RAG │ ├── Vector DB │ └── Knowledge Graphs │ ├── 02. Agent Brain │ ├── Planning │ │ ├── Task Breakdown │ │ ├── Goal Routing │ │ └── Reflection │ │ │ ├── Reasoning │ │ ├── ReAct │ │ ├── Tree of Thoughts │ │ ├── Multi-Agent Debate │ │ └── Self-Correction │ │ │ └── Decision Engine │ ├── Tool Selection │ ├── Memory Retrieval │ └── Action Prioritization │ ├── 03. Tool Layer │ ├── Web Search │ ├── Browser Automation │ ├── Code Execution │ ├── APIs │ ├── Database Access │ └── File Systems │ ├── 04. Agent Workflows │ ├── Research Agents │ ├── Coding Agents │ ├── Sales Agents │ ├── Customer Support Agents │ ├── Content Agents │ └── Autonomous Workflows │ ├── 05. Multi-Agent Systems │ ├── Manager Agent │ ├── Worker Agents │ ├── Reviewer Agents │ ├── Specialized Skills │ └── Shared Memory Bus │ ├── 06. Infrastructure │ ├── LangGraph │ ├── CrewAI │ ├── OpenAI Agents SDK │ ├── MCP │ ├── Docker │ └── Kubernetes │ ├── 07. Observability │ ├── Logs │ ├── Traces │ ├── Evaluations │ ├── Hallucination Checks │ └── Cost Monitoring │ ├── 08. Security Layer │ ├── Sandboxing │ ├── Permission Control │ ├── Secret Management │ ├── Guardrails │ └── Human Approval Loops │ └── 09. Future of Agents ├── Voice Agents ├── Computer Use ├── AI Employees ├── Self-Improving Agents └── Autonomous Companies Most people use AI like a chatbot. The next generation will use AI like an operating system.

LearnWithBrij's tweet photo. AI AGENT STACK — MASTER TREE 🌲

AI Agents
│
├── 01. Foundation Layer
│ ├── LLMs
│ │ ├── GPT-4.1
│ │ ├── Claude
│ │ ├── Gemini
│ │ └── DeepSeek
│ │
│ ├── Prompting
│ │ ├── System Prompts
│ │ ├── Few-shot
│ │ ├── Chain of Thought
│ │ └── Structured Output
│ │
│ └── Context
│ ├── Memory
│ ├── RAG
│ ├── Vector DB
│ └── Knowledge Graphs
│
├── 02. Agent Brain
│ ├── Planning
│ │ ├── Task Breakdown
│ │ ├── Goal Routing
│ │ └── Reflection
│ │
│ ├── Reasoning
│ │ ├── ReAct
│ │ ├── Tree of Thoughts
│ │ ├── Multi-Agent Debate
│ │ └── Self-Correction
│ │
│ └── Decision Engine
│ ├── Tool Selection
│ ├── Memory Retrieval
│ └── Action Prioritization
│
├── 03. Tool Layer
│ ├── Web Search
│ ├── Browser Automation
│ ├── Code Execution
│ ├── APIs
│ ├── Database Access
│ └── File Systems
│
├── 04. Agent Workflows
│ ├── Research Agents
│ ├── Coding Agents
│ ├── Sales Agents
│ ├── Customer Support Agents
│ ├── Content Agents
│ └── Autonomous Workflows
│
├── 05. Multi-Agent Systems
│ ├── Manager Agent
│ ├── Worker Agents
│ ├── Reviewer Agents
│ ├── Specialized Skills
│ └── Shared Memory Bus
│
├── 06. Infrastructure
│ ├── LangGraph
│ ├── CrewAI
│ ├── OpenAI Agents SDK
│ ├── MCP
│ ├── Docker
│ └── Kubernetes
│
├── 07. Observability
│ ├── Logs
│ ├── Traces
│ ├── Evaluations
│ ├── Hallucination Checks
│ └── Cost Monitoring
│
├── 08. Security Layer
│ ├── Sandboxing
│ ├── Permission Control
│ ├── Secret Management
│ ├── Guardrails
│ └── Human Approval Loops
│
└── 09. Future of Agents
├── Voice Agents
├── Computer Use
├── AI Employees
├── Self-Improving Agents
└── Autonomous Companies

Most people use AI like a chatbot.

The next generation will use AI like an operating system.

39

2K

534

3K

120K

ImranKBangash retweeted

30 days ago

Efficient 3D city reconstruction with Gaussian Splatting https://t.co/hXUNKQyg9Y

tom_doerr's tweet photo. Efficient 3D city reconstruction with Gaussian Splatting

https://t.co/hXUNKQyg9Y https://t.co/zomXwyHpHI

0

117

16

92

6K

ImranKBangash retweeted

AshutoshShrivastava

@ai_for_success

29 days ago

Antigravity CLI Useful Commands Cheat Sheet.

ai_for_success's tweet photo. Antigravity CLI Useful Commands Cheat Sheet. https://t.co/IylqIeHluv

34

1K

197

1K

50K

ImranKBangash retweeted

30 days ago

🚨NotebookLM + Google Antigravity is one of the most powerful combo available right now—and almost no one is using it. If you’re not taking advantage of this, you’re missing out on serious leverage. Here’s how to set it up in 2 minutes + what it can do 👇

jamescoder12's tweet photo. 🚨NotebookLM + Google Antigravity is one of the most powerful combo available right now—and almost no one is using it.

If you’re not taking advantage of this, you’re missing out on serious leverage.

Here’s how to set it up in 2 minutes + what it can do 👇 https://t.co/gXxffDbX2h

84

2K

339

3K

210K

ImranKBangash retweeted

about 1 month ago

PanoWorld. An interesting way to use Qwen-Edit. It converts 2D floor plans into photorealistic, consistent VR home tours. Great for real estate and interior designers. It lets you walk through a home that hasn’t been built or furnished yet. Ensures seamless 360 views via CPRoPE https://t.co/rWDnjaqE5k

7

331

39

410

20K

ImranKBangash retweeted

about 1 month ago

ComfyUI wrapper for LiTo. Single RGBA image to 3D Gaussians in ComfyUI https://t.co/RqR8v92DA7

6

659

83

628

49K

Imran Khan @ImranKBangash

about 1 month ago

Persistent memory was step one. Now comes always-on, asynchronous, event-driven agents. The hard part isn’t intelligence anymore — it’s reliability, permissions, coordination, and knowing when not to act.

0

0

0

0

4

ImranKBangash retweeted

Phuc Nguyen Duc Anh @phucnda

about 1 month ago

We are excited to release the code for our paper OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness, accepted to #CVPR2026. Source Code: https://t.co/na6fLrmzE8 From a dashcam video, OpenVO estimates how the camera of the vehicle moves in metric scale.

3

382

47

323

21K

ImranKBangash retweeted

Guillermo Casaus

about 1 month ago

🚨 Google acaba de liberar sus skills oficiales para agentes de IA. Ha publicado 13 skills compatibles con Claude Code, Cursor, Copilot y otros agentes. Permiten que los agentes puedan ejecutar tareas avanzadas y automatizar flujos de trabajo complejos. Es gratis y open-source 👇

_guillecasaus's tweet photo. 🚨 Google acaba de liberar sus skills oficiales para agentes de IA.

Ha publicado 13 skills compatibles con Claude Code, Cursor, Copilot y otros agentes.

Permiten que los agentes puedan ejecutar tareas avanzadas y automatizar flujos de trabajo complejos.

Es gratis y open-source 👇

43

4K

401

6K

362K

ImranKBangash retweeted

about 1 month ago

Pixal3D (SIGGRAPH’26): a new image-to-3D paradigm for high-fidelity 3D asset creation. Demo, code, paper available!

0

98

11

70

15K

Last Seen Users on Sotwe

Trends for you

Most Popular Users