If you’re serious about robot learning you (unfortunately) need to know about video compression. Camera streams dominate data volumes for most datasets at 90+% even when compressed. Video is more complicated to deal with but the size wins are too big to give up.
The unit of compression is a Group Of Pictures (GOP). In the simplest case (what you should use in robotics), GOPs start with a keyframe (I-frame) that is followed by several delta frames (P-frames). Delta frames only need to encode the difference to the previous frame which is where the compression win comes from.
That means to decode frame 15 of a 30-frame GOP you need to feed all the preceding frames in the GOP to the decoder to get out that one frame. The GOP controls the tradeoff between random access and compression.
Why does this matter for robot learning? Because while training, dataloader performance is dominated by fetching and decoding video. To build a streaming dataloader (you need this for large datasets) it needs to take GOPs into consideration when fetching data for a time step. It’s hard enough to build a dataloader that doesn’t starve your GPUs that most teams forgo flexibility. That means researchers at most of the best funded robotics efforts currently wait around for large export jobs before training can start after each change to the dataset mix or the wrong hyperparameter.
This situation obviously won’t last since they all know that experiment cycle times is a key lever to fast progress and the competitive pressure is enormous. If you want to compete in this space you need both flexibility and performance.
Cursor でうまく問題修正できなかったらこのプロンプト試すといいよだって
「問題の原因として考えられるものを5~7つ挙げ、それを1~2つの最も可能性が高い原因に絞り込んでください。その上で、実際のコード修正に進む前に、仮説を検証するためのログを追加してください。」
"Reflect on 5-7 different possible sources of the problem, distill those down to 1-2 most likely sources, and then add logs to validate your assumptions before we move onto implementing the actual code fix"