ezio auditore @trathpai - Twitter Profile

trathpai retweeted

5 days ago

INSID3 segments objects across domains using ONLY ONE annotated example it works entirely without a segmentation decoder, task-specific fine-tuning, or external mask generators like SAM CVPR 2026 paper with enormous practical potential

8

635

82

529

53K

trathpai retweeted

Wildminder

@wildmindai

17 days ago

Another cool stuff from NVIDIA. LocateAnything - high-speed visual search engine. You provide a text prompt and it instantly pinpoints that object's exact location in an image. - 10x speedup for dense object detection - Qwen2.5-3B + Moon-ViT - Fast/Slow/Hybrid modes - trained on 138M samples for UI, docs, generic grounding. https://t.co/bEvD6pRKaR

13

1K

152

1K

51K

ezio auditore @trathpai

22 days ago

@skalskip92 @NielsRogge @ilyasut Single object trackers and Multi object trackers are missing from vision space.

0

14

ezio auditore @trathpai

9 months ago

Itihaas ki Kitaab >>>> Khitaab #INDvPAK #IndianCricket #AsiaCup2025

0

190

Who to follow

Rhinigtas Salvex

@rhinigtas

Transhumanist, AI-Researcher, AI-Artist

Takyon∞

@Takyon

AI security research at https://t.co/YlpIqx5p33 · making things with AI since VQGAN, breaking them since before it was a job

Eugene Chekhov

@chekhov_eugene

ezio auditore @trathpai

10 months ago

@xenovacom Thanks for sharing! I was just trying it but couldn't see selections when clicking on a object of interest. Also processing video doesn't show any selected area. Something i am missing?

0

50

trathpai retweeted

Forrest Iandola @fiandola

over 1 year ago

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!

13

510

106

320

66K

trathpai retweeted

Ian Johnson 🔬🤖

@enjalot

over 1 year ago

I am obsessed with Sparse Autoencoders! SAEs unpack so much existing value and unlock exciting new capabilities. It's happening in text, images and even proteins. This is a long thread with lots of links and quote tweets of the projects, articles and code that made me 🤯

17

1K

188

2K

215K

trathpai retweeted

AK

@_akhaliq

over 1 year ago

kotaemon An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind.

_akhaliq's tweet photo. kotaemon

An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind. https://t.co/8pNB8hbEGN

2

84

18

73

11K

trathpai retweeted

Niels Rogge @NielsRogge

almost 2 years ago

Alright finally able to dreambooth myself with Flux for free! Note that this is actually what @levelsio or services like @FAL or @replicate are monetizing. Here's how (small 🧵):

NielsRogge's tweet photo. Alright finally able to dreambooth myself with Flux for free!

Note that this is actually what @levelsio or services like @FAL or @replicate are monetizing.

Here's how (small 🧵): https://t.co/OPd3Lqb0dp

12

697

58

1K

171K

ezio auditore @trathpai

almost 2 years ago

@paperswithcode is it being updated ? last updates were from 30th July !

0

20

ezio auditore @trathpai

almost 2 years ago

Divided by borders , united by blue #CrowdStrike #microsoft

0

2

0

626

ezio auditore @trathpai

almost 2 years ago

Luma Labs dancing and transformation. @LumaLabsAI #LumaDreamMachine

0

1

0

97

trathpai retweeted

Aran Komatsuzaki

@arankomatsuzaki

about 2 years ago

Meta presents Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Features trained on their automatically curated datasets outperform ones trained on manually curated data https://t.co/HBlj7kHMPC

arankomatsuzaki's tweet photo. Meta presents Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Features trained on their automatically curated datasets outperform ones trained on manually curated data

https://t.co/HBlj7kHMPC https://t.co/GFvRecFKHr

3

558

109

457

290K

trathpai retweeted

Find me on bsky @colin-fraser.net @colin_fraser

about 2 years ago

There’s an art to distilling these to the absolute minimal necessary text. The human brain can’t comprehend how stupid these things are without practice.

colin_fraser's tweet photo. There’s an art to distilling these to the absolute minimal necessary text. The human brain can’t comprehend how stupid these things are without practice. https://t.co/NM3cUZXOt5

97

10K

676

785

1M

ezio auditore @trathpai

over 2 years ago

@EMostaque Hard working yoda selling burgers on NYC street, night scene , cinematic

0

99

trathpai retweeted

AK

@_akhaliq

over 2 years ago

Analyzing and Improving the Training Dynamics of Diffusion Models paper page: https://t.co/qzZIA0SxW6 Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations and weights over the course of training, we redesign the network layers to preserve activation, weight, and update magnitudes on expectation. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity. Our modifications improve the previous record FID of 2.41 in ImageNet-512 synthesis to 1.81, achieved using fast deterministic sampling. As an independent contribution, we present a method for setting the exponential moving average (EMA) parameters post-hoc, i.e., after completing the training run. This allows precise tuning of EMA length without the cost of performing several training runs, and reveals its surprising interactions with network architecture, training time, and guidance.

_akhaliq's tweet photo. Analyzing and Improving the Training Dynamics of Diffusion Models

paper page: https://t.co/qzZIA0SxW6

Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations and weights over the course of training, we redesign the network layers to preserve activation, weight, and update magnitudes on expectation. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity. Our modifications improve the previous record FID of 2.41 in ImageNet-512 synthesis to 1.81, achieved using fast deterministic sampling. As an independent contribution, we present a method for setting the exponential moving average (EMA) parameters post-hoc, i.e., after completing the training run. This allows precise tuning of EMA length without the cost of performing several training runs, and reveals its surprising interactions with network architecture, training time, and guidance.

2

508

99

328

282K

trathpai retweeted

Sasha Rush

@srush_nlp

over 2 years ago

Several people have told me over drinks that these puzzles are being used for ML tech interviews 🤣. https://t.co/B5L2bja1DN https://t.co/Yk1lWRqilN

14

1K

130

3K

212K

trathpai retweeted

AK

@_akhaliq

over 2 years ago

Style Aligned Image Generation via Shared Attention paper page: https://t.co/GsbI3fRShE Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.