Seungwook Kim @1ndependentGrad - Twitter Profile

Hirokatsu Kataoka | 片岡裕雄 #CVPR2026

2 days ago

We've uploaded the slides from the #CVPR2026 VGI (Visual General Intelligence) Workshop. Robert Geirhos Are generative video models the path towards solving visual intelligence? https://t.co/mgXhMEd51r

HirokatuKataoka's tweet photo. We've uploaded the slides from the #CVPR2026 VGI (Visual General Intelligence) Workshop.

Robert Geirhos
Are generative video models the path towards solving visual intelligence?
https://t.co/mgXhMEd51r https://t.co/QN6OY0bOYV

2

81

13

73

7K

1ndependentGrad retweeted

0xkato

@0xkato

2 days ago

It reached #1 on Hacker News 😀

18

2K

75

2K

429K

1ndependentGrad retweeted

Chrome

@0xchromium

2 days ago

Andrej Karpathy spent 2h showing how he actually uses AI day to day he's a co-founder of OpenAI and led AI at Tesla, so when he shows how he works, it’s worth watching and the whole session is just him telling the machine what he wants in simple terms, like he's briefing a coworker watch what's actually happening the entire time: > he describes the task in normal words > it goes off and does the work > he glances at the result and nudges it with one more sentence that's the whole skill, and you've had it since you learned to talk the only gap between that and a worker that runs on its own is handing that sentence a schedule and the tools to act check his work, then build the version that keeps working when you stop

116

10K

1K

29K

2M

1ndependentGrad retweeted

Jon Barron

@jon_barron

1 day ago

If your conference talk is good, you should upload it somewhere (I use YouTube). If your talk is not good enough for you to feel comfortable uploading it somewhere, then it is *definitely* not good enough to present at the conference, and you should fix it.

2

176

6

13

22K

1ndependentGrad retweeted

Peyman Milanfar

@docmilanfar

2 days ago

A year ago I argued ML saved CV. CVPR’26 best paper D4RT shows how vision isn’t being blindly assimilated by AI - it’s undergoing architectural consolidation. Unifying tracking, depth&pose into a 4D model speeds up pose estimation 100x. Geometric vision remains alive & relevant

2

179

12

81

26K

1ndependentGrad retweeted

Xiuyu Li

@sheriyuo

2 days ago

https://t.co/aJEXibSlmN

16

2K

165

4K

271K

1ndependentGrad retweeted

Google Research

@GoogleResearch

3 days ago

Introducing D4RT: A unified AI model for 4D scene reconstruction and tracking across space and time. 🎯 Catch the demo with Skanda Koppula at 12 pm at our #CVPR2026 Google booth kiosk! https://t.co/p6SclNe1zi @GoogleDeepMind

GoogleResearch's tweet photo. Introducing D4RT: A unified AI model for 4D scene reconstruction and tracking across space and time. 🎯 Catch the demo with Skanda Koppula at 12 pm at our #CVPR2026 Google booth kiosk! https://t.co/p6SclNe1zi @GoogleDeepMind https://t.co/svPcVvvUi7

19

1K

137

714

74K

1ndependentGrad retweeted

Zhuang Liu

@liuzhuang1234

8 days ago

Excited to share VLM³ - standard VLMs go surprisingly far in 3D!

3

183

13

117

24K

1ndependentGrad retweeted

Yongyuan Liang

@cheryyun_l

11 days ago

I have to say I totally agree that VLA training is a multi-objective optimization problem balancing VL capacity and action/trajectory generation...

5

174

11

108

25K

1ndependentGrad retweeted

Dmytro Mishkin 🇺🇦 @ducha_aiki

10 days ago

Short history of last 6 years of image matching (all by transformers): 2020: @pesarlin SuperGlue 2023: @Vinc3nt_Leroy DUSt3R 2024: @Parskatt RoMa 2025: @jianyuan_wang VGGT 2026: @davnords "hold my beer" (scales LightGlue) 2025: @jianyuan_wang "no, you hold MY beer" (scales VGGT)

3

211

31

118

14K

1ndependentGrad retweeted

Andrej Karpathy

@karpathy

28 days ago

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral https://t.co/z21CP5iQfu There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

1K

19K

2K

21K

4M

1ndependentGrad retweeted

Xiuyu Li

@sheriyuo

about 1 month ago

作为一个自诩对 RL 略懂一二的人，看教程汗颜了（草学习 LLM 就需要 Hands-On 的教程，直接从一个 Objective 出发去读懂代码，以后你就能 review 好 AI 的代码欢迎大家关注支持原作者 @sanbuphy English version is coming soon.

7

697

104

795

98K

1ndependentGrad retweeted

Vidit Ostwal

@ViditOstwal

about 1 month ago

After reading this blog by @willccbb, I fell down a rabbit hole on On-Policy Distillation. Here's my breakdown: the problem, the existing fixes, why they fall short, and what OPD actually changes. A thread 🧵

1

81

10

119

15K

1ndependentGrad retweeted

Turing Post

@TheTuringPost

about 1 month ago

There’s a serious gap in multimodal models – they work with images, but still reason in language, which isn’t that precise for visual stuff. @deepseek_ai just dropped an idea to solve this: let the model literally point to exact locations in the image while it thinks. They call it "Thinking with Visual Primitives." These visual primitives are: - points (specific locations) - bounding boxes (areas in the image) Using them, the model knows what exactly it’s referring to and achieves ~77% better accuracy on average (vs. Gemini 3 Flash's 76.5% and 71.1% for GPT-5.4) Plus, only ~80–90 visual tokens are kept in memory after compression thanks to the efficient architecture Here is how it works:

TheTuringPost's tweet photo. There’s a serious gap in multimodal models – they work with images, but still reason in language, which isn’t that precise for visual stuff.

@deepseek_ai just dropped an idea to solve this: let the model literally point to exact locations in the image while it thinks.

They call it "Thinking with Visual Primitives."

These visual primitives are:

- points (specific locations)
- bounding boxes (areas in the image)

Using them, the model knows what exactly it’s referring to and achieves ~77% better accuracy on average (vs. Gemini 3 Flash's 76.5% and 71.1% for GPT-5.4)

Plus, only ~80–90 visual tokens are kept in memory after compression thanks to the efficient architecture

Here is how it works:

11

497

81

319

32K

1ndependentGrad retweeted

そう｜Claude Codeで始めるAI自動化

@so_ainsight

about 1 month ago

ガチで衝撃。世界TOPサイトのデザインルール2,000件が、AIエージェント用に無料公開された。「AIが作るUIが垢抜けないのは、良いデザインを見たことがないからだ」そんな問題意識から生まれたサービスが Refero Styles。・色、書体、余白、レイアウトをルール化した1枚のファイル（DESIGN(.)md）・人気サービス2,000件分を、AIが直接読める形で収録・キーワード検索もできて全部無料 AIに「あの感じで作って」が通じる時代になりました👇

3

690

73

1K

60K

1ndependentGrad retweeted

DAIR.AI

@dair_ai

about 1 month ago

The Top AI Papers of the Week (April 26 - May 3) - Latent Agents - RecursiveMAS - OneManCompany - AgenticQwen-30B-A3B - Agentic World Modeling - Agentic Harness Engineering - From Skill Text to Skill Structure Read on for more:

8

220

46

238

32K

1ndependentGrad retweeted

Tabassum Parveen

@Tabbu_ai

about 1 month ago

Instead of watching an hour of Netflix, watch this 2-hour Stanford lecture. It will teach you more about how LLMs like ChatGPT and Claude are actually built than most people in top AI companies learn across their entire careers. Save this.

10

84

17

95

13K

1ndependentGrad retweeted

Jouhatsu | AI Influence Operator

@Jouhatsu_ai

about 1 month ago

Anthropic vient de publier officiellement le blueprint pour créer une entreprise avec Claude Code et c'est hallucinant😭 PDG : 1 humain (qui dort) Employés : plusieurs IA Activités: les IA se répartissent les tâches et avancent seules Le travail est littéralement en train de mourir... J'ai résumé le guide complet en français, lis ça quand t'as 5 min ⤵️ Si tu veux que l'IA bosse pendant que tu dors → garde ça en signet 🔖

45

2K

368

7K

2M

1ndependentGrad retweeted

huangserva

@servasyy_ai

about 1 month ago

Anthropic 给能从零开始构建 LLM 架构的工程师开出的年薪超过 75 万美元。而斯坦福只用一小时的课，就把整个原理讲完了，还免费公开。核心观点总结: 1. 原始 Transformer 在架构上基本是正确的，主要改动: Norm 位置、去掉 bias、GLU 激活 2. 架构选择是表达力、训练效率和稳定性的复杂权衡 3. 超参数选择有宽容区间，遵循约定俗成的默认值即可 4. 稳定性已成为比表达力更重要的设计考量（训练成本越高越关键） 5. 推理效率(KV Cache)驱动了 GQA 和混合注意力的广泛采用 6. 如果你有稳定性问题，就往里面撒 Layer Norm , 虽然荒谬但已被验证有效先收藏起来，今天就看，免得哪天被下架。

6

841

196

1K

134K

1ndependentGrad retweeted

Corey Ganim

@coreyganim

about 1 month ago

The personal knowledge base build, in 60 seconds: Total setup: 45 minutes this weekend. Then it compounds forever. 1. 5 minutes: Setup Create 3 folders: raw/, wiki/, outputs/. Drop a CLAUDE.md schema file in the root. Done. 2. 10 minutes: Dump Copy-paste articles, notes, screenshots, meeting transcripts into raw/. Don't rename. Don't organize. 3. 30 minutes: Let the AI build Point Claude at the folder. "Read everything in raw/. Compile a wiki following CLAUDE.md rules. Create INDEX.md first." Walk away. Come back to organized articles, [[linked]] topics, and a searchable index. 4. Ongoing: The compounding loop Ask questions. Save answers back to raw/. Every query makes the next answer better. 5. Monthly: Health check Tell the AI to flag contradictions, find unexplained topics, and suggest 3 new articles to fill gaps. The system gets smarter the longer you use it. Day 1 it's basic. Day 90 it's a company asset nobody else has.

coreyganim's tweet photo. The personal knowledge base build, in 60 seconds:

Total setup: 45 minutes this weekend. Then it compounds forever.

1. 5 minutes: Setup
Create 3 folders: raw/, wiki/, outputs/. Drop a CLAUDE.md schema file in the root. Done.

2. 10 minutes: Dump
Copy-paste articles, notes, screenshots, meeting transcripts into raw/. Don't rename. Don't organize.

3. 30 minutes: Let the AI build
Point Claude at the folder. "Read everything in raw/. Compile a wiki following CLAUDE.md rules. Create INDEX.md first."

Walk away. Come back to organized articles, [[linked]] topics, and a searchable index.

4. Ongoing: The compounding loop
Ask questions. Save answers back to raw/. Every query makes the next answer better.

5. Monthly: Health check
Tell the AI to flag contradictions, find unexplained topics, and suggest 3 new articles to fill gaps.

The system gets smarter the longer you use it.

Day 1 it's basic. Day 90 it's a company asset nobody else has.

41

1K

148

3K

189K

Seungwook Kim

@1ndependentGrad

Last Seen Users on Sotwe

Trends for you

Most Popular Users