Last week @Trevornoah asked @OpenAI@miramurati:
How can we safeguard against AI-powered photo editing for misinformation?
MIT students hacked a way to "immunize" photos against edits: https://t.co/fcCb7VSzEj
@aleks_madry
Last week on @TheDailyShow, @Trevornoah asked @OpenAI@miramurati a (v. important) Q: how can we safeguard against AI-powered photo editing for misinformation? https://t.co/awTVTX6oXf
My @MIT students hacked a way to "immunize" photos against edits: https://t.co/zsRxJ3P1Fb (1/8)
Can we cast ML predictions as simple functions of individual training inputs? Yes! w/ @andrew_ilyas@smsampark@logan_engstrom@gpoleclerc, we introduce datamodels (https://t.co/ypR7MTCrHS), a framework to study how data + algs -> predictions. Blog: https://t.co/DQiF0wyHQw (1/6)
@PreetumNakkiran@dustinvtran This was our goal when we open-sourced https://t.co/kfAajLA9R7. It is the same length ans the pytorch imagenet example but up to 10x faster. We would love to hear feedback from the community to know if we can improve in any way.
@jefrankle@PreetumNakkiran (3) because FFCV does GPU augmentation and data movement in parallel it might still give you a little boost in other cases. It's always worth a try. Feel free to ask technical questions on our slack!
@jefrankle@PreetumNakkiran@jefrankle. From our experience, yes. (1) First case is when you are IO bottlenecked, with the appropriate parameters FFCV will dramatically improve the throughput you get from your storage. (2) FFCV makes it easy to move augmentation from/to the CPU to maximize speed. 1/2
@ArashVahdat - Allows declaring arguments where they need to be
- Allows capturing the arguments where they are needed
- Supports the definition of arguments through a combination of both config files (easy to checkout on git), and CLI for env dependent args
https://t.co/dpoQNo4Jpo
@crude2refined@aleks_madry Colab only runs python 3.7 which doesn't include `multiprocessing.shared_memory`. Therefore the earliest compatible version is 3.8 :/ As soon as Colab updates python we will have an example notebook!
@yanndubs@aleks_madry@williamfalcon@PyTorchLightnin We have been using PTL with FFCV in our lab with success for quite a bit now. They are definitely complementary. The only caveat is that one has to override a few things from PTL. We will release a demo soon but feel free to join our slack it has been discussed there.
@code_star@giffmana We have been using FFCV internally on shared clusters with many different GPUs including V100s, 2080ti, 1080ti and it really helped a lot, especially since most of these clusters use network attached storage and don't have fast local storage and you share CPU with other users.
@Anshumali_@aleks_madry We did try it (it was definitely better but still much slower than FFCV), but due to lack of good interop with PyTorch and the fact that webdataset is meant to fulfill the same function, we decided to stick with the latter for our thorough benchmarking.
@jacobgorm @aleks_madry@schrep JPEG and RAW are just two example data types that FFCV can work with. It's really easy to add other Field Types. You can either keep it for yourself or submit a pull request! We would love to have WEBP support.
@RafailFridman@aleks_madry@ml_norms If it is sampled only once (i.e., getitem returns the same thing for the same index), FFCV can be used out of the box! Otherwise, you can (1) have getitem return the parameters of the distribution/do any needed pre-processing (2) use FFCV's fast data pipeline to do the sampling.
ImageNet is the new CIFAR! My students made FFCV (https://t.co/QWUdL5hRxS), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute).
FFCV speeds up ~any existing training code (no training tricks needed) (1/3)
@PhongStormVN@aleks_madry While we haven't personally experienced with this one, FFCV was designed to accommodate virtually any dataset. In the case of COCO (segmentation map), one can easily store the segmentation map in an additional field. Feel free to join our slack if you need help getting started!
@michalwols@kevin_zakka@chriswolfvision@aleks_madry@soumithchintala Indeed none of the ready-to-go PyTorch code we found was able to saturate 8A100s (that's part of why we wrote FFCV!). We also compared against the fastest speed/accuracy numbers we could find for PyTorch in the ResNet-50 8xA100 accuracy scatterplot.