Mayank gaur

@mayank_2OOO

ML Engineer 🤖 | Building with Generative AI & Computer Vision . Following tech research, politics, finance, chess.Opinions expressed are mine.

New Delhi, India

Joined October 2017

113 Following

24 Followers

76 Posts

Pinned Tweet

Mayank gaur @mayank_2OOO

about 2 months ago

I just came back from a trip with 250+ videos and photos in my camera roll. I wanted to post a video capturing the best moments, but manually scrubbing through gigabytes of raw footage to find the perfect clips and photos is a massive bottleneck, and time consuming also. So, I'm building a solution: Watch It. If you are interested , Follow along to watch me build this. 🚀 Here is the first look:

mayank_2OOO's tweet photo. I just came back from a trip with 250+ videos and photos in my camera roll. I wanted to post a video capturing the best moments, but manually scrubbing through gigabytes of raw footage to find the perfect clips and photos is a massive bottleneck, and time consuming also.

So, I'm building a solution: Watch It. If you are interested , Follow along to watch me build this. 🚀
Here is the first look:

3

5

0

0

147

Mayank gaur @mayank_2OOO

about 1 month ago

@Manu_Sisti Oh sorry my bad! I have followed you now @Manu_Sisti

0

0

0

0

3

Mayank gaur @mayank_2OOO

about 1 month ago

The goal for Watch It is to give everyone a professional video editor in their pocket. This Beat Engine is the "brain" that makes it possible. If you’re building in the GenAI/Video space, I’d love to hear how you’re handling AV-sync. Drop a comment! 🚀 #BuildInPublic #AI #ComputerVision

0

0

0

0

50

Mayank gaur @mayank_2OOO

about 1 month ago

Most AI-generated videos feel "uncanny" or boring. Why? Because the visual energy is decoupled from the audio intent. If the beat drops but the camera doesn't move, the human brain flags it as low-quality. I just finished the "Beat Engine" for Watch It. Its designed in a way that it doesn't just listen for beats; it analyzes musical intent. Lyrrics extractor is also ready. Lyrrics analysis is important to encapsulate emotional feel of assets and place right assets at right time. For example, a nice photo of sunset will enhance the emotional impact many times, if lyrric line says something 'sun goes down' etc. Spolier: lyrrics extractor is just a LLM call😊

Mayank gaur @mayank_2OOO

about 2 months ago

I just came back from a trip with 250+ videos and photos in my camera roll. I wanted to post a video capturing the best moments, but manually scrubbing through gigabytes of raw footage to find the perfect clips and photos is a massive bottleneck, and time consuming also. So, I'm building a solution: Watch It. If you are interested , Follow along to watch me build this. 🚀 Here is the first look:

mayank_2OOO's tweet photo. I just came back from a trip with 250+ videos and photos in my camera roll. I wanted to post a video capturing the best moments, but manually scrubbing through gigabytes of raw footage to find the perfect clips and photos is a massive bottleneck, and time consuming also.

So, I'm building a solution: Watch It. If you are interested , Follow along to watch me build this. 🚀
Here is the first look:

3

5

0

0

147

2

3

0

0

67

Who to follow

@Ani_Dutta_2011

Sarvesh Salelkar

@SalelkarSarvesh

ABHISHEK AGRAWAL🔆🔆

Mayank gaur @mayank_2OOO

about 1 month ago

The engine now outputs a Normalized Intensity Score. This score acts as a bridge between the raw audio signal and the Video Model. Instead of just "Cut here," the engine sends instructions like: High Intensity: Trigger a high-magnitude Zoom Punch + Flash. Medium Intensity: Execute a hard cut to a new perspective. Low Intensity: Maintain the shot but adjust the camera "drift." The result? Transitions will be much better.

1

0

0

0

29

Mayank gaur @mayank_2OOO

about 2 months ago

https://t.co/8kv3mFeqT9

0

0

0

0

19

Mayank gaur @mayank_2OOO

about 2 months ago

The Fourier Transform is hard to wrap your head around. Most explanations are buried in complex math and dense equations. But I just found a blog post that explains FFT with pure, vector based and visual intuition. If you work with data, signals, or ML, bookmark this, this is worth a read. resource is in the comment.

1

0

0

0

35

Mayank gaur @mayank_2OOO

about 2 months ago

This is a heavy multimodal challenge, and I'm building the entire pipeline in public. I’ll be posting my dev logs, architecture breakdowns, interesting insights and actual friction of making audio and vision models play nice. Follow along to watch me build this. 🚀

0

0

0

0

43

Mayank gaur @mayank_2OOO

about 2 months ago

I just came back from a trip with 250+ videos and photos in my camera roll. I wanted to post a video capturing the best moments, but manually scrubbing through gigabytes of raw footage to find the perfect clips and photos is a massive bottleneck, and time consuming also. So, I'm building a solution: Watch It. If you are interested , Follow along to watch me build this. 🚀 Here is the first look:

mayank_2OOO's tweet photo. I just came back from a trip with 250+ videos and photos in my camera roll. I wanted to post a video capturing the best moments, but manually scrubbing through gigabytes of raw footage to find the perfect clips and photos is a massive bottleneck, and time consuming also.

So, I'm building a solution: Watch It. If you are interested , Follow along to watch me build this. 🚀
Here is the first look:

3

5

0

0

147

Mayank gaur @mayank_2OOO

about 2 months ago

The hardest (and most fun) part I'm tackling right now is the "Matchmaker" agent. I'm building it to parse lyrics, pair high-impact visuals with the right audio cues, and inject smart transitions and motion effects so the final output actually feels like it was paced by a human editor.

1

0

0

0

47

Mayank gaur @mayank_2OOO

about 2 months ago

@buildanythingso Hi

0

0

0

0

7

Mayank gaur @mayank_2OOO

about 2 months ago

Hot take: Image models won’t win on aesthetics alone anymore. They’ll win on: • fidelity (don’t touch what matters) • controllability (edit specific regions) • consistency (multi-image coherence) On those axes, ChatGPT Images 2.0 looks very strong. And that’s what production teams actually care about.

0

0

0

0

74

Mayank gaur @mayank_2OOO

about 2 months ago

I’ve been using Nano Banana 2 for product creatives. Biggest issue I kept hitting: It re-renders text on the product itself. If your product is text-heavy (labels, packaging), the model subtly changes it, breaks brand accuracy. Just tried ChatGPT Images 2.0. And this is the first thing that stood out: - It preserves product text far more reliably. No unwanted “creative reinterpretation” of labels. This is a bigger deal than it sounds. Because for real-world pipelines: • Packaging text = compliance • Branding = non-negotiable • Even small changes = unusable asset Most models still fail here. Second big unlock: Localized editing actually works. You can: → Change background → Adjust composition → Keep product untouched Earlier models struggled with this balance. What this means in practice: You can now: • Keep product fidelity • Iterate on creatives around it • Avoid full regeneration loops That’s a major workflow improvement. So is it “better than Nano Banana 2”? Too early to say definitively. Gonna need more testing. But for: → Product-heavy creatives → Text-sensitive assets → E-commerce pipelines ChatGPT Images 2.0 already feels like a step ahead. The real shift: We’re moving from “generate everything again” to: “edit precisely without breaking what matters” That’s where these models start becoming production-ready.

1

0

0

0

79

Mayank gaur @mayank_2OOO

about 2 months ago

As someone building in the AI space, the GPT image to mask feature is a total game-changer. I've been holding out for reliable mask support since Nano Banana 1. Precise control over generation is finally here! 🚀🪄 #MachineLearning #genai #agenticAI

0

1

0

0

53

Mayank gaur @mayank_2OOO

3 months ago

@Justine17856705 @AmericaSpoof @grok is this true?

1

1

0

0

3K

Mayank gaur @mayank_2OOO

3 months ago

@rahulnegiwho 26 sure is a messy time.

1

1

0

0

18

Last Seen Users on Sotwe

Trends for you

Most Popular Users