Visual Explorer: OCR'ing A Year And A Half Of CSPAN Through Tesseract
To seed further research into the potential new kinds of insights that could be derived by searching and analyzing the onscreen text of our nation's governance using open OCR tools, today in collaboration with the Internet Archive's TV News Archive and the multi-party Media-Data Research Consortium, we are releasing a new dataset of nearly a year of a half of Tesseract OCR'd text from CSPAN, running January 1, 2022 through April 30, 2023, applying Tesseract to each image from the every-4-seconds Visual Explorer preview images. In all, 11,192 broadcasts totaling 10,375,897 images representing 41.5 million seconds of airtime were OCR'd by Tesseract yielding 1.5GB of JSON containing 472MB of OCR'd text.
https://t.co/AhLPisvu8a
Fully Autonomous Diplomacy + Counter-Messaging Experiments With ChatGPT + GDELT
Given the ability of Large Language Models (LLMs) like ChatGPT to craft human-like prose, how easily could they be used to fully autonomously watch television news, identify narratives that run counter to US interests and generate articulate and fluent counter-messages for different mediums, ready for distribution and without any human intervention required? Such use cases are extremely ethically fraught, but their inevitable application raises the question of just how easy current tools might make this process and how useable the end results might be.
Overall, the results here suggest that ChatGPT and GDELT can be combined today with just a few lines of code to create a fully automated narrative monitoring and counter-messaging system. At the same time, the results do suggest that ChatGPT 3.5 lacks the ability to fully recreate the unique voice of non-Western media, especially media systems that feature heavily contextualized narration, but at the same time, the results above are not that far removed from some past human-driven counter-messaging efforts undertaken by Western nations. Most importantly, through proper prompt engineering, additional examples and fine-tuning one could readily yield an LLM capable of writing in a more authentic voice.
The kind of fully automated counter-messaging workflow presented here raises myriad ethical and moral questions, but the near-certainty of these kinds of workflows proliferating in the immediate term necessitates a better understanding of what such systems might look like and their nuances in order to understand how to identify and counter them.
In the end, the idea of a fully automated counter-messaging system is no longer science fiction – it is here today and available with just a few lines of code.
https://t.co/BzzTnn02xK
The timeline below compares the percentage of airtime across business television news channels since the start of last year that mentioned President Biden versus Elon Musk, showing that twice last year coverage of Musk nearly equaled that of Biden in a reflection of his outsized media persona.
https://t.co/x7auGfBMWc
WashPost: The TikTok fight Is A Generational Fight
The Post's Philip Bump includes a graph of mentions across television news using the TV Explorer:
https://t.co/cTvOtUI9Tr
WashPost: Ray Epps Seeks The Seemingly Impossible: An Apology From Tucker Carlson
References television news coverage of Ray Epps via the TV Explorer:
https://t.co/bHOeC3gu17
Fox News Dominates Mentions Of "Radical" Over The Past Decade
As the timeline and graph below show, Fox News has dominated mentions of the word "radical" over the past decade.
https://t.co/EkBzRMETck
Being "Canceled" Took Off In 2020 On Television News But Has Been Fading Since 2021
The timeline below tracks total mentions of "canceled" on television news, showing nearly equal mentions through mid-2020, when the term took off on Fox News, but has been declining on Fox since a peak of March 2021.
https://t.co/h49ZbwvX3S
Mentions of "woke" and "wokeness" surged on Fox News from January 2021, but over the last three months have surged on CNN and MSNBC as well.
https://t.co/hmUUUQ21gI
Visual Explorer: Creating Visual Networks Of Facial Co-Occurrences On An Episode Of Russian TV News' 60 Minutes – Revisited
Last week we demonstrated using a simplistic facial extraction and visual clustering pipeline to extract the faces from a single episode of Russian TV News Russia 1's "60 Minutes" and build a co-occurrence graph of who appears alongside of whom. To make the pipeline as easy to use as possible, we used a very simplistic pipeline of an older face extractor that is less accurate than modern tools but extremely fast, coupled with a perceptual hash-based clustering postprocessor to group faces together to track them across frames. The results suggested considerable promise for this analytic approach, but also demonstrated the existential limitations of such a simple pipeline. Today we revisit that exploration using a modern face extraction and clustering pipeline that yields vastly more accurate results.
https://t.co/JY39VjDiHZ
Adding Confidence Scores To Tracking A Year Of Tucker Carlson On Russia 1's "60 Minutes"
Last month, in collaboration with the Internet Archive's TV News Archive, we demonstrated scanning a year of Russia1's "60 Minutes" for all appearances of Tucker Carlson. Let's repeat that analysis with a more advanced tool that also generates a distance score of the extracted face compared with the source face, allowing us to post-filter to remove false positives, identify the strongest matches, etc.
https://t.co/i6tRUMNljN
Sampling Russian television news broadcasts every 4 seconds and pairwise comparing those "visual ngrams" over an entire broadcast yields a powerful tool for cataloging advertising, identifying key advertising trends across the Russian television news landscape and how the ad economy is adjusting in the face of global sanctions. Using more sophisticated tooling for identifying ad content and using signature-based tracing approaches, it would be possible to fully automatically construct a live catalog of advertising activity across Russian television news to understand the brands, industries, products and services being advertised and how that composition has changed over the past year as the impact of sanctions has continued to build.
https://t.co/msrMrEeqkz
In collaboration with the @internetarchive , the Visual Explorer extracts one frame every 4 seconds from each broadcast to create a "visual ngram" that non-consumptively captures the core visual narratives of the broadcast. What if we took all of those images for a given Russian TV news broadcast and pairwise compared each image to every other image in that broadcast based on pixel-level visual similarity (using a perceptual hash)? The end result would allow us to not only identify contiguous sequences (marking "shot changes"), but, most importantly, to identify repeated content that makes an appearance multiple times throughout a broadcast, ranging from a clip that is aired multiple times at different points in the broadcast to repeated advertisements.
https://t.co/ZQ4YMk2i4U
Visualizing Who Appears Alongside Whom On An Episode Of Russian TV News' 60 Minutes
Who appears alongside whom on television news represents a key editorial decision of what voices to pair. From split-screen displays to the back-and-forth of presenters and guests, understanding co-occurrence patterns on television news offers a powerful lens into the underlying narrative storytelling of a broadcast. What if we could analyze such co-occurrence patterns automatically, generating a network visualization of the faces that appear onscreen in the same frame or subsequent frames over an entire broadcast?
https://t.co/AHZNj4rAlY
Yesterday, in collaboration with the @internetarchive's TV News Archive, we announced the availability of more than 1 billion words of transcribed and translated Belarusian, Iranian, Russian and Ukrainian television news broadcasts. How might we examine these transcripts with ChatGPT to understand what a day of Russian television news says about Ukrainian president Volodymyr Zelensky?
https://t.co/qJgISDJpx9
In collaboration with the @internetarchive, more than a billion words of Belarusian, Iranian, Russian And Ukrainian television news now accessible for narrative analysis:
https://t.co/PiKzvy40s4
Rep. Marjorie Taylor Greene's (MTG) disapproval of military support to Ukraine remains popular on Russian state television, such as this excerpt of her CPAC speech and one of her Tucker Carlson appearances.
https://t.co/KUO4QpmTQC