Danny Bickson @BicksonDanny - Twitter Profile

almost 3 years ago

Here are some of the quality issues that you may find: ‣ Duplicates ‣ Outliers ‣ Mislabels ‣ Corrupted images ‣ Train/test leakage ‣ Overly bright/dark/blurry images Notebooks: ‣ Kaggle Notebook - https://t.co/ZXhQ0KnrC8 ‣ Colab Notebook - https://t.co/IfCbUpKPKo

1

0

61

Danny Bickson @BicksonDanny

about 3 years ago

@nietras1 we have just packaged Meta's #dinov2 model using #fastdup. It should be super easy to run: import fastdup fd=fastdup.create(input_dir=<your images>, work_dir=<output folder>) https://t.co/yzXjUTj2S8(model_path='dinov2s') fd.vis.component_gallery() LAION results:

BicksonDanny's tweet photo. @nietras1 we have just packaged Meta's #dinov2 model using #fastdup. It should be super easy to run:
import fastdup
fd=fastdup.create(input_dir=<your images>, work_dir=<output folder>)
https://t.co/yzXjUTj2S8(model_path='dinov2s')
fd.vis.component_gallery()

LAION results: https://t.co/4AI7A2Y7x2

1

3

1

0

79

Danny Bickson @BicksonDanny

over 3 years ago

@Eric_Wallace_ Thanks for featuring our github repo fastdup title image! https://t.co/Qcn2yNFIae Everyone should try us out for deduplicating large scale image repos. It is free!

0

18

BicksonDanny retweeted

Eric Wallace

@Eric_Wallace_

over 3 years ago

See our paper for a lot more technical details and results. Speaking personally, I have many thoughts on this paper. First, everyone should de-duplicate their data as it reduces memorization. However, we can still extract non-duplicated images in rare cases! [6/9]

Eric_Wallace_'s tweet photo. See our paper for a lot more technical details and results.

Speaking personally, I have many thoughts on this paper. First, everyone should de-duplicate their data as it reduces memorization. However, we can still extract non-duplicated images in rare cases! [6/9] https://t.co/5fy8LsNbjb

6

470

16

14

102K

Who to follow

Piotr Nawrot

@p_nawrot

LLM Efficiency @NVIDIA - views have always been only my own 🥇🥈 @ Flunkyball Polish Championships

shaharglazner

@shaharglazner

Co-Founder & CTO @keepalerting. Likes to think out loud and a long distance runner.

UW Blockchain Society 🐾

@uw_blockchain

Blockchain Society @UW Student-led builder org focused on meaningful education, applied research, and shipping projects.

Danny Bickson @BicksonDanny

over 3 years ago

@Suhail Hi @Suhail we are building exactly that, and coincidentally you already used us! https://t.co/XWqBBDspJt We would love to chat if you are open to it and explore collaboration.

0

10

Danny Bickson @BicksonDanny

over 3 years ago

@Suhail Hi @Suhail I am the co-creator of fastdup, it is great to learn you managed to clean 500,000 in one hour including learning our tool. We would love to hear what are you doing with images and see if we can help in any way!

0

1

0

27

Danny Bickson

@BicksonDanny

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users