Kyle Sargent @KyleSargentAI - Twitter Profile

Pinned Tweet

11 days ago

Today we released “GPIC: A Giant Permissive Image Corpus for Visual Generation.” It’s a 100M image dataset for visual generation, with text captions and 100% known+permissive licenses, hosted on HuggingFace. I’m excited to get this out! Check it out: https://t.co/6pZ66Nihgx

KyleSargentAI's tweet photo. Today we released “GPIC: A Giant Permissive Image Corpus for Visual Generation.” It’s a 100M image dataset for visual generation, with text captions and 100% known+permissive licenses, hosted on HuggingFace. I’m excited to get this out! Check it out: https://t.co/6pZ66Nihgx https://t.co/Ly947cCbBd

3

77

15

13

6K

KyleSargentAI retweeted

Justin Johnson

@jcjohnss

11 days ago

GPIC should be the new standard benchmark for generative modeling. Training 1 epoch on GPIC is the same cost as 100 epochs on ImageNet, but is a much better proxy for real-world problems. If you work in generative modeling, try GPIC for your next project!

9

100

17

34

42K

Kyle Sargent

@KyleSargentAI

11 days ago

Correction: the right cite for this table is "PixelDiT: Pixel Diffusion Transformers for Image Generation," not PixGen. Thanks :)

0

2

0

137

Kyle Sargent

@KyleSargentAI

11 days ago

Today we released “GPIC: A Giant Permissive Image Corpus for Visual Generation.” It’s a 100M image dataset for visual generation, with text captions and 100% known+permissive licenses, hosted on HuggingFace. I’m excited to get this out! Check it out: https://t.co/6pZ66Nihgx

3

77

15

13

6K

Who to follow

ジレ

@dbSGRfefoh4fZdu

長崎→下関→長崎 #vvaren 「正々堂々」「愛と平和と一生懸命」ホームもゴール裏に基本います GB350🏍️

Se June Joo

@joocjun

🤖🦾Research Engineer @RLWRLD_ai |M.S @kaist_ai| B.S in Math & CS @yonsei_u

Sedrick Keh

@sedrickkeh2

member of technical staff @AnthropicAI prev: research scientist @ToyotaResearch

Kyle Sargent

@KyleSargentAI

11 days ago

One practical example is epoch count – “state-of-the-art” models on ImageNet-1K train for 300-1700 epochs (Fig. credit: PixGen). But that’s not the way you would do things outside of an academic comparison – you’d just go get more data!

KyleSargentAI's tweet photo. One practical example is epoch count – “state-of-the-art” models on ImageNet-1K train for 300-1700 epochs (Fig. credit: PixGen). But that’s not the way you would do things outside of an academic comparison – you’d just go get more data! https://t.co/s9pExMwRqq

2

6

0

344

Kyle Sargent

@KyleSargentAI

11 days ago

Personally: I spent a portion of my PhD working on strong tokenizers, which is sort of at odds with the current ImageNet-1K meta to add regularization whenever possible, so I’m also personally excited to see how this dataset drives tokenization research. Happy pretraining!

0

7

0

231

Kyle Sargent

@KyleSargentAI

11 days ago

I’m also proud of this section of the paper, which gives best practices for compliance with our eval protocol. Without calling out anyone in particular, let me just say that using auxiliary foundation models to get a better FD-DINOv2 on GPIC without being very up front about the huge advantages of the extra data and model FLOPs is super bad – please don’t do it!

KyleSargentAI's tweet photo. I’m also proud of this section of the paper, which gives best practices for compliance with our eval protocol. Without calling out anyone in particular, let me just say that using auxiliary foundation models to get a better FD-DINOv2 on GPIC without being very up front about the huge advantages of the extra data and model FLOPs is super bad – please don’t do it!

1

10

1

456

KyleSargentAI retweeted

Keshigeyan Chandrasegaran

@keshigeyan

11 days ago

1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research + commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI

keshigeyan's tweet photo. 1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation!

🚀100M VLM-captioned image-text pairs for training
📊1M image-text pairs for benchmarking
🖼️~28 trillion pixels
🤗Centrally Hosted
✅Fully permissive for research + commercial use

Dataset, benchmark and models🧵👇

Co-led with @KyleSargentAI

15

366

83

226

138K

Kyle Sargent

@KyleSargentAI

12 days ago

Based on which ML papers that get a lot of hype on X, I wouldn't be so sure the human moat is "research taste" xD

1

21

0

3

2K

Kyle Sargent

@KyleSargentAI

20 days ago

Genuinely cool mission and good product. Congrats!

Exa

@ExaAILabs

20 days ago

We raised $250M in Series C funding at a $2.2B valuation, led by a16z. Exa is a search lab organizing the web's data for agents.

158

2K

169

821

1M

0

15

0

2

2K

Kyle Sargent

@KyleSargentAI

20 days ago

@poolio Big fan of synthwave Ben

0

1

0

56

KyleSargentAI retweeted

David Duvenaud

@DavidDuvenaud

about 1 month ago

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

200

4K

451

2K

1M

Kyle Sargent

@KyleSargentAI

2 months ago

@mattdeitke 😮

0

96

KyleSargentAI retweeted

ℝussell Pekala @EryDayImRusslen

2 months ago

Today @YuzuHealthInc announces our Series A! Our mission is to bring trust and agency back to health insurance. Thank you to our customers, partners, and team who made this possible! Blog and hiring link in the comments.

EryDayImRusslen's tweet photo. Today @YuzuHealthInc announces our Series A!

Our mission is to bring trust and agency back to health insurance.

Thank you to our customers, partners, and team who made this possible!

Blog and hiring link in the comments. https://t.co/leEX0m0ZIR

33

163

30

84K

Kyle Sargent

@KyleSargentAI

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users