AGI achieved externally in the 4chan chat by miqudev anon, on 29th January 2024.
Here goes a 🧵with Miqu rocking everything I ask (from datasets, random things I find from the internet and more).
Feel the AGI!!
Using the Q5 (biggest model) version, with this llama.cpp config:
Wow! Motion capture studios not gonna love this!
Just check this insane video.
For years, capturing human motion meant markers, skin-tight suits, and hours of cleanup.
MAMMA just asks for a few synced cameras pointed at the scene.
Out comes a full 3D body for every person, every frame.
No suits. No markers.
The clever bit: instead of tracking a handful of joints, it reads hundreds of points.
And it actually understands contact.
It knows when a foot touches the ground, when two people are holding each other.
So feet stop sliding and bodies stop passing through each other, even when dancers are tangled up close.
When @Michael_J_Black calls this maybe the biggest day in 3D capture history, you pay attention.
The kicker? It's basically as accurate as the gold-standard Vicon systems studios pay a fortune for.
A multi-day pipeline drops to a single day. It even works with 4 iPhones.
Here's why roboticists should care: clean, realistic human motion at scale, without the expensive rig.
The thing holding humanoids back was never really the algorithm. It was the data.
Keep up good work the @Hanzcun, @soyong_shin, @AYiannakidis and the rest of the MAMMA team!
🤯 LOS DE CUPHEAD SE HAN VUELTO LOCOS🔴
El nuevo juego de Cuphead está siendo desarrollado nativamente para la Sega Master System. Studio MDHR se lo está tomando tan en serio que lo están programando en lenguaje ensamblador puro y respetando las especificaciones técnicas de los 80s.
Obviamente saldrá para PC y consolas actuales, pero han confirmado que también va a existir un cartucho físico real para la Master System.
Qué ganas de jugar el Mighty Cuphead Adventure para ver esta locura xdd
I ran 2D pose estimation on a pickleball rally, with skeletons on all four players through the entire point:
- Keypoint R-CNN (torchvision) detects the 17 body joints for each player
- PyTorch runs the model inference
- OpenCV connects the joints and draws the skeleton overlay onto the frames
Buy a red car and within a week, red cars are suddenly everywhere. Same number on the road as last month. Your brain just decided red matters to you now, so it quit letting them blur past and started flagging every single one. The same switch is running your whole life.
Scientists named that little glitch the frequency illusion, and the clip above is the same effect dressed up as physics. A row of weights hangs from one string, each a different length, so each likes to swing at its own speed. Wiggle the bar at the right pace and only the weight that matches it takes off, swinging wild, while the rest barely budge. That is resonance. It is also how a radio works: the dial locks onto one station out of the thousands flying through the air and ignores the rest.
Your focus tunes the exact same way, and it can be ruthless about what it throws out. In a famous Harvard study, people watched a short clip and counted how many times a basketball got passed around. Easy enough. Around half of them never noticed the woman in a full gorilla suit who strolled into the middle of the players, faced the camera, and beat her chest for a solid nine seconds. Nine out of ten people are sure they would have caught it. They are usually wrong.
So the tweet is partly right. Your brain does run a filter that picks what gets through to you. Self-help says you can aim that filter like a wish and pull money or love straight to your door. The truth is closer to a sleepy bouncer than a genie. Its whole job is to notice what you keep staring at and wave more of it through. You are finally seeing the red cars that were there the entire time.
The same filter has a darker setting. In depression and anxiety, it gets stuck on the bad stuff, the threats and the losses, and struggles to look away, which quietly feeds the mood that jammed it there to begin with. There is research into gently coaching that filter back toward calmer things. The results are mixed so far, but the idea holds.
Whatever you keep pointing your attention at really does grow louder in your world. Inside your own skull, a machine decides what you see, running on whatever you keep feeding it. The ball that swings is the one you keep tapping in rhythm.
Just published the source code and research paper to this approach.
Repo also includes a demo voxel editor to try out yourself.
https://t.co/z4k0LPEwll
Morph 2K pre-orders are now open!
A $199 analog video scaler for retro gaming with composite, S-Video, SCART, component, and VGA support. Incredible video quality, CRT simulation, auto-sampling, Wi-Fi updates, and more.
To celebrate launch day, we're giving away one Morph 2K.
To enter:
✓ Follow @PixelFXco
✓ Like & repost this post
Winner selected June 20th.
https://t.co/xQqM20tdnt
REPA thoughts
REPA as x0 prediction
- the DINO/Encoder features prediction can be seen loosely as "x0 prediction in an alternate representation space than main diffusion space, from an intermediate/rather than final hidden state"
- If so, we should consider the same principles for predicting x0, like loss weighting etc.
- simply: we know noise prediction is "impossible" from fully clean image (we know nothing about the noise sample), x0/clean image prediction is "impossible" from fully noised state (we know nothing of the image). velocity as (x0 - eps) suffers at both endpoints, at x0 we know nothing of eps and can't really predict this direction, and vice versa
- the general form of this is both endpoints of the trajectory are a domain, and when we're on the opposite side, the minimum risk prediction is to predict the mean of the opposing domain/dataset (or conditional slice if relevant)
- Because of this, predicting DINO features at a fully noised state may not give us a helpful signal yet might be disproportionately weighted due to high error/grad magnitude.
IREPA claim that structure > representation quality
- If so, representation space features might be superfluous
- "Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model" demonstrates that image diffusion models naturally develop a representation of depth, fairly reliably predicted even at high noise levels with only a linear probe
- This observation could be "flipped on its head" we know a performant pretrained model has this quality in the end, we could do the inverse and add supervision/guidance to encourage this early
- segmentation maps, depth, etc feel like good candidates if its spatial/structure properties that drive the improvement as they nicely correlate to low frequency coarse structure
Mapper design, spatial target + parameterization axes
- a few papers have covered this, but the design of the mapper and level of compression describes the assumption we're making and guidance we enforce
- Spatial/Compression: i.e. full spatial patch-to-patch matching, pooled, or something in between like a downsample/perceiver pooling
- Depth and Architecture of mapper: how much of the burden of adaptation for this secondary task falls on the backbone model vs the actual mapper parameters.
- are we wanting to be a linear projection away from the reference space? a shallow MLP away? a deeper mapping?
-- My impression is the shallower options deliberately ask the diffusion representation space to already be somewhat denoised (because we're mapping to clean x0 in DINO space) and a more trivial mapping away.
-- Deeper options can sort out the noise filtering and more extensive work to map to DINO space, if that's adapted/isolated to the mapper, then i feel the question is less about taking on a shape that resembles the reference feature space, and instead having high mutual information. To successfully predict the features, the information has to be there, any variables around getting to that other space the mapper squares away.
You can use mujoco warp to compile your simulation as a cuda graph, and then launch that cuda graph from C to have absurdly fast robotics simulations. Nvidia warp also open sources a bunch of really great renderers, fluid sim stuff