Jack Urbanek

@JackUrbs

I'm a Founding Member of DatologyAI's technical staff, working to solve automated data curation for ML training at scale. Formerly worked at FAIR.

Joined May 2012

19 Following

442 Followers

23 Posts

JackUrbs retweeted

Ari Morcos

@arimorcos

3 months ago

Love seeing the timeline wake up to the fact that data is the most underinvested area in ML. But let’s set the record straight: the world’s premier data research company isn't hypothetical. It already exists. It’s called @datologyai, and we’ve been building it for 2.5 years. 🧵

127

28K

Jack Urbanek @JackUrbs

8 months ago

+1 - Seeing some of my great coworkers get unceremoniously dropped was part of my impetus for leaving Meta in the first place. Would love to chat about how green the grass is over at Datology, DM me if you're curious.

Matthew Leavitt

@leavittron

8 months ago

If you were impacted by the recent Meta layoffs (or even if you weren't) and you're interested in doing ambitious, rigorous science and/or engineering that powers a real product that actual customers pay us ca$h money for, please DM me or head over to https://t.co/4truVQtHWy. We're particularly interested in people that have experience with any of data curation, post-training, training stacks/infra, and data infra.

Jack Urbanek @JackUrbs

8 months ago

If I've learned anything in my last 2 years at @datologyai it's that running production-scale research opens a whole host of interesting engineering problems. This is just a peek.

JosH100

@josh_wills

8 months ago

The folks at @AmplifyPartners went deep with my team at @datologyai on the engineering challenges involved in large-scale data curation for training models-- from deduplicating a non-trivial fraction of the internet to orchestrating dozens of experiments and terabytes of data every single day: https://t.co/mTh2mOmY87

103

JackUrbs retweeted

Ricardo Monti @RicardoMonti9

about 1 year ago

. @datologyai is back: state of the art CLIP model performance using data curation alone 🚀 ✅ state-of-the-art ViT-B/32 performance: ImageNet 1k 76.9% vs 74% reported by SigLIP2 ✅ 8x training efficiency gains ✅ 2x inference efficiency gains ✅ Public model release Details in the 🧵 thread below 👇

RicardoMonti9's tweet photo. . @datologyai is back: state of the art CLIP model performance using data curation alone 🚀

✅ state-of-the-art ViT-B/32 performance: ImageNet 1k 76.9% vs 74% reported by SigLIP2
✅ 8x training efficiency gains
✅ 2x inference efficiency gains
✅ Public model release

Details in the 🧵 thread below 👇

149

37K

Who to follow

Jason Weston

@jaseweston

Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 148k+ citations | Current: Self-Improving & Co-Improving AI

Ethan Perez

@EthanJPerez

Alignment team lead at Anthropic

Adina Williams

@adinamwilliams

Computational linguistics, cognitive science, NLP; semantics + syntax AI research scientist (NYC); formerly @nyuling Also @adinawilliams.bsky.social

Jack Urbanek @JackUrbs

over 1 year ago

Are you a Researcher or Engineer (or something between) interested in driving results like these? Excited by the idea of pushing the frontier of AI through improvements in data? Join us! We’re hiring for full-time and internship positions: https://t.co/dQaN3aDUN2 13/13

238

Jack Urbanek @JackUrbs

over 1 year ago

Incredibly excited to be sharing what we’ve been working on at @DatologyAI. Thanks to a ton of work from an incredible team, I think we have a strong showing for the impact of our curation for LLMs: We make RPJv1 better than the best available pretraining datasets around. 🧵1/n

JackUrbs's tweet photo. Incredibly excited to be sharing what we’ve been working on at @DatologyAI. Thanks to a ton of work from an incredible team, I think we have a strong showing for the impact of our curation for LLMs: We make RPJv1 better than the best available pretraining datasets around.
🧵1/n https://t.co/sDSAUiZZj4

Jack Urbanek @JackUrbs

over 1 year ago

Closing out, we’re only just getting started on this ride, and week over week we’re pushing the frontier of generalizing curation methodologies, getting stable results from them, and putting them into production. 12/n

254

JackUrbs retweeted

Matthew Leavitt

@leavittron

over 1 year ago

🧵We’ve spent the last few months at @datologyai building a state-of-the-art data curation pipeline and I’m SO excited to share our first results: we curated image-text pretraining data and massively improved CLIP model quality, training speed, and inference efficiency 🔥🔥🔥

175

76K

JackUrbs retweeted

DatologyAI @datologyai

over 2 years ago

Hello world! We are incredibly excited to come out of stealth today to help make better data accessible to everyone, automatically. Hear from our founders about our mission and vision for DatologyAI: https://t.co/trNyGhM8jt

20K

JackUrbs retweeted

Jason Weston

@jaseweston

about 3 years ago

🚨 New work: BlenderBot 3x 🚨 - Public data release & analysis of 6M chat interactions. - Learns by conversing with people in the real world: training on this data improves BB3 from 85.3% → 94.4% good messages. paper: https://t.co/HOiPcbAFng project: https://t.co/kTNr2rJyIV

jaseweston's tweet photo. 🚨 New work: BlenderBot 3x 🚨
- Public data release & analysis of 6M chat interactions.
- Learns by conversing with people in the real world: training on this data improves BB3 from 85.3% → 94.4% good messages.

paper: https://t.co/HOiPcbAFng
project: https://t.co/kTNr2rJyIV https://t.co/3QPQxq9P4u

272

112

117K

Jack Urbanek

@JackUrbs

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users