For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited that we're releasing our latest model aligned with this theme:
Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs!
1/
@youtalk ignore the slightly inaccurate pose estimation, but here's my tv, and the warping on the flat surface is due to an inaccuracies of the network's depth estimation
created a 3d point cloud of my room using my iphone's lidar, camera, and pose in ROS2 with the help of https://t.co/nwQxZM0h6c @youtalk . Main takeaway is that the iPhone depth sensor is not as accurate as I thought. It relies on sensor fusion with the camera which causes inaccuracies observed in monocular depth estimation
Claude and codex subscriptions each at $20 is a killer combo. Two models that can offer opposing opinions alone is amazing but also by two companies trying everything they can to outcompete one another. 😙
Got my first subscriber to my YouTube account for bilingual Chinese English music. Got side tracked by some vision projects with cvpr models getting released left and right. But this is exciting
@Ric_RTP It’s still cost effective if each person has one agent. It gets incredibly expensive when each employee is deploying 5 or more agents because there’s no longer human in the loop and things become sloppy
talked to a YC company that scaled from $0 → $2m ARR in their first 6 months with their ENTIRE GTM built off going to conferences.
Here's the playbook they cracked (step by step):
~4 weeks before:
> Post abt the conference and tell attendees exactly how to reach you
> Send personal DMs to the right ppl on LinkedIn and X
> Reply within the hour & lock in 10 top targets to close.
> Send everyone else to your drip email campaign.
Then, set a meeting block of 1-3 days during the conference:
> make shared booking link for the team
> Reserve a quiet café / private dining room
> Pack in 12 meetings per day, 30 min each, with buffer time built in
While you're there:
>Hand every prospect a thoughtful small gift and a personal card
>Single out 5 standout customers whose pain ur product actually solves
>Pull them aside for a casual on-camera Q&A in a solid film spot
>Don't pitch hard.
>Let the conversation breathe and weave your product in naturally.
The 4 weeks after
>Hand the raw footage to a freelance editor + ask for ~15-20 punchy clips with captions.
>Drop a new clip every couple of days on LI / X
> use these clips when you post online about the next conference to keep the momentum
This is the formula, costs less than a few thousand dollars to execute.
They’re on track to end the Y1 at ~$6m ARR (B2B, targeting large enterprises) + STILL not using any other channels for customer acquisition
@jianyuan_wang Distortions of buildings in the background and some duplication of objects like the trees in the foreground. I guess one reason is that it's not estimating camera distortion parameters. But the background distortions seem to be a common issue with monocular depth estimation.