Just built a fitness app for poke.
โ track calories with just a simple text or picture of your food
โ have poke coach you and be ur personal workout buddy
Dm me if u want me to send u the beta link.
Say hi to the new Poke! ๐ด
Now officially approved by Apple to text on Apple Messages.
As the first and only AI agent. Chat now: https://t.co/VIWYU64dUI
@MilHoornaert@interaction Itโs just convenience. There tons of other ways out there like spreadsheet but I find using an LLM/agent to be lot easier and natural.
@beardthelyon@interaction Oh yea it does. Absolute game changer. it pulls steps and active calories from apple health. Hoping to try and integrate more soon.
Just stumbled upon a new paper called COBALT.
COBALT = = crowdsourced robot teleop from smartphones
While the control interface of teleop with smartphones is interesting the paper goes deeper in exploring a scalable cloud infrastructure for collecting imitation learning data and remote teleop
Maybe this is the universal way to bridge the gap?
Give it a read: https://t.co/B6xiudKqJ4
Weโre dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video.
It combines Geminiโs intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing ๐งต
For new upcoming research ArXiv is ur best friend.
Just found this super interesting one about how they trying to detect โfalse claimsโ of returns for Amazon, DoorDash and Uber Eats by seeing if itโs an AI generated photo or a real one.
One of the biggest bottlenecks in VLA models is long horizon reasoning.
Standard policies just look at the current frame -
ฯ(aโ | imageโ, instruction) and this works fine for short tasks, but after 20โ100 steps, minor errors compound, the robot drifts and everything breaks down.
TraceVLA proposes giving the robot an explicit visual memory of its past trajectory without bloating the context window.
Instead of passing a massive history of raw frames, it overlays motion traces directly onto the current image using a dense point tracker
The pipeline is super simple:
1.Track keypoint trajectories from past frames
2.Filter for actual movement
3.Overlay those trajectories onto the current observation frame
4.Feed both the clean image and the trace image into the VLA backbone
Input sequence: [Image tokens] โ [Trace tokens] โ [Language tokens] โ Transformer โ Action
Instead of forcing the transformer to reason across a long temporal context window, it literally compresses trajectory history into a spatial representation. The robot doesnโt need to remember the entire action sequence because it can just see the path it already took.
The results speak for themselves:
โข +10% improvement over OpenVLA in simulation
โข 3.5ร higher success rate on real robot tasks
โข Strong cross-embodiment generalization
Check the OG paper: https://t.co/Ylw5PTqAcZ
What if robots could imagineโฆ
As I was learning more about VLAs I came across smth called VAMs
The core idea is that just like humans can โimagineโ whatโs going to happen and โvisualizeโ the action for picking up a cup or placing it in a box VAMs aims to do the same with robots
And in my head the same way u train a policy by teleportation and mimicking the movement, this is tryna recreate the same concept.
This was the one I found: https://t.co/b0lLHrolWP
Lmk ur thoughts, if I misunderstood smth or if u wanna chat about it in more detail.