I will present NoteIt at #UIST2025. NoteIt can convert instructional videos into interactive notes while faithfully preserving chapter/step structure and multimodal key information.
🎧 Oct. 1, 10:00 AM | Capri Room
Instructional videos are a good way to master new skills, but what happens when you forget a single step or minor ingredient? For most, that means fast forwarding through the whole video – or scrolling through chapters in the caption. But a team of researchers from HKU Engineering and @HkuIot57647 wanted a better way. Using AI, they developed NoteIt, an automated note-taking software that could identify and extract steps and information from instructional videos and turn them into interactive notes that can be viewed in multiple formats.
That’s easier said than done, says one of the paper’s lead authors, @Running_Zhao. While existing tools can summarise videos, they struggle to preserve the step-by-step structure and multimodal content used in most instructional videos. Solving the problem required the HKU team to identify both both chapter-level and step-level structures, as well as static and dynamic “keyframes” – which mark transitions between shots. The tool also needed to be able to capture and analyse verbal content, including differentiating between instructions and tips or warnings.
Finally, the team wanted it to be interactive, with users able to adjust the notes according to their preferred level of detail and learning style. The result, Zhao says, is a tool “that can generate notes with the hierarchical structures and key content emphasised by the creators in the video.” (1/2)
What we need most now is an AI application that can improve work efficiency and quality of life in reality, NoteIt(by @Running_Zhao ) is cool, I can't wait to find my personal AI coach + training notes!
(1/n)🛠We are thrilled to present NoteIt, a system that automatically converts instructional (How-To) videos into interactive notes. #UIST2025
🔗website: https://t.co/RyIlz8Y0oE
(5/n) 3️⃣ Make it yours. With NoteIt’s interface, learners can customize the content verbosity (concise or detailed), presentation modality (text-only or text-image/GIF pairs), and engagement mode (printable or interactive) to match their preferences.