The result is that users can see the rich original document, hear a natural spoken version, and still get accurate highlighting as the audio plays.
Try Paper2Audio with one of our demo documents and see Reader View highlighting in action, or upload your own document.
“Just highlight the words as they’re read” sounds simple for text to speech document processing until the document has equations, citations, Roman numerals, and formatting that shouldn’t be read aloud literally.
https://t.co/DzgNuC4xoh
The post explains how Paper2Audio solves this with a reconciliation algorithm that runs after audio generation, using matching words as anchors and then mapping mismatched sections like equations, citations, formatting, and Roman numerals back to the visible text.
Example: a document might have a section labeled “Part III,” but the TTS model may read “Part III” as “Part 3.” That means the visible text and spoken text do not match, but the highlight still needs to move through the audio transcript at the right time.
We create a new spoken version for the TTS model while still showing formatting from the original document in Reader View. That creates a challenge for word highlighting because audio timestamps come from the audio version, but highlights need to match the enhanced transcript.
In our latest Paper2Audio engineering blog post, we go behind the scenes on a surprisingly tricky part of building Reader View, our enhanced audio transcript: keeping word-level highlighting accurate when the text users see in the transcript is not the same text read aloud.
*Bug fixes @ UI improvements, part 3*
-Fewer unintended pauses in the narration.
-Bugs related to autoplay not advancing to the next document, some documents getting stuck during processing, issues with transcript syncing across clients, foreign character narration, and more.
May updates from Paper2Audio!
*Citation removal for EPUBs*
Footnote references and content are now stripped from EPUB audio narration for cleaner listening for academic and non-fiction books. This was previously only available for PDFs.
*Bug fixes & UI improvements, part 2*
-Switch between grid and list layout in the document library and collections on desktop.
-Exported and copied highlights now include PDF page numbers to help more easily find the passage in the document when referencing it later.
*Bug fixes and UI improvements, part 1*
--Playback now rewinds 3 seconds when resuming after a long pause, so you never lose context.
--Live countdown ETA with real-time estimated completion for documents in processing.
--Faster audio downloads on iOS when the app is open.
*Push notifications from the app when your document is ready*
Get notified that your document is done processing even when the app is closed, so you don't have to wait on the processing screen.
*Better handling of visual elements*
-Captions from figures and tables in PDFs are now displayed below visual elements in Reader View.
-Improved accuracy of AI-generated summaries for figures, tables, code blocks, and math equations in PDFs.
*Narration improvements*
-Math subscripts and superscripts are spoken more naturally not being read out literally.
-Abbreviation pronunciation, including common abbreviations.
-Roman numeral detection more accurately reads things like "III" as "the third"
--Better header removal
If you use Paper2Audio for accessibility-related reasons and the Plus plan would better meet your needs than the free plan, please contact us by email ([email protected]). We offer accessibility discounts and would be happy to share more information.
Today (May 21) is the 15th annual Global Accessibility Awareness Day (GAAD), a yearly event spearheaded by the GAAD Foundation to get more people talking, thinking and learning about digital access and inclusion.
We know accessibility is an ongoing commitment and we’re continuing to improve Paper2Audio so more people can comfortably read, listen to, and learn from the documents that matter to them.
Some of our accessibility features:
—Audio versions of documents for users who can’t rely only on visual reading
—Reader View makes it easier to read documents with complex formatting on a small screen
—Website and apps with screen reader support
—Dyslexia friendly font options
Digital accessibility is central to our mission of helping people engage with and learn from their documents, whether they prefer to read, listen, or both. This is personal for us, too. Even on our small team, we have people who rely on accessibility tools for migraine and ADHD.
Rich text formatting: We preserve the formatting of your documents, including math, headings, lists, and other inline styling. Citations are included so that you know when an author is making a reference, but citation text is only read aloud when needed to keep sentences intact.
Paper2Audio is now optimized for reading your documents too, not just listening. We are excited to announce Reader View, our new method of reformatting documents to be more readable and fit your screen, while including rich content like images and document formatting.
•"Figure view" for visuals: Click on any visual element to bring up the figure view pop up, then zoom and pan around for a more detailed view.
•Single column view: Documents with multiple columns are displayed in a single column to improve readability on smaller screens.