3 insights newsrooms should consider as they experiment with generative AI
Smart Data Initiative Newsletter Blog | 12 July 2023
Hi everyone.
Greetings from full-on summer. Is there a long, complicated word in the German language for that feeling when it-seems-summer-still-feels-incredibly-long-like-surely-it-will-never end? That’s usually the moment when I dust off my long list of things I’ve been meaning to read, and my version of this this year is working on an AI reading list for you all. So, really, what I’m saying is that you will soon have a long list of things you’ve been meaning to read.
But for this week, I’m sharing with you an update from the Associated Press’ work on various AI projects.
Lots of happy summer vibes to you all,
Ariane
The Associated Press supports local newsrooms with AI projects
These days, I have the good fortune to hear about so many interesting projects kicking off using generative AI, at publishers of all kinds of size and experience with this type of technology.
And this is not just exciting because allowing a thousand flowers to bloom is going to help this industry identity where this type of technology can be most helpful at its current stage of advancement, but also because the spirit of experimentation itself is an excellent muscle to build. <Loudspeaker announcement> Oh, while I am here, please do write to me to share your current experiments so I know to keep watch </announcement>
Whenever we experiment though — and this point isn’t just about data, of course — we also lean onto another muscle, which is change management. And the chat I recently had with Ernest Kung, a product manager at The Associated Press who has been focusing on AI projects, gave me some really useful perspective on what this may look like for projects that involve generative AI.
Ernest and his colleagues at AP are working on 5 different projects using AI to support small, local newsrooms, with support by the John S. and James L. Knight Foundation. Except for the project in Puerto Rico, which uses NLP and structured data, the other four would be likely to use generative AI technology.
These are the five projects, listed verbatim from The AP:
Automated writing of public safety incidents into the content management system of Minnesota newspaper Brainerd Dispatch.
Publication of Spanish-language news alerts using National Weather Service data in English by the newspaper El Vocero de Puerto Rico.
Automated transcription of recorded videos and summarising the transcripts to create an article’s initial framework at San Antonio, Texas, television station KSAT-TV.
Sorting of news tips and coverage pitches from the public and automatically populating them into the coverage planner of Allentown, Pennsylvania, television station WFMZ-TV.
Expanding the Minutes application, which creates transcripts of city council meetings, to include summarisation, keyword identification and reporter alerts, for staff at Michigan Radio’s WUOM-FM at the University of Michigan.
The projects are currently being worked on, so this newsletter isn’t a case study into them (yet, come back in a few months!), though they present an excellent range of ideas for where generative AI can come to support journalism in improving productivity of daily, tedious tasks. They are also ambitious enough to have a shot at making visible improvements for humans, but not so ambitious that they are not achievable.
Speaking with Ernest, and taking stock of how the projects were doing, he shared three insights I thought were absolutely worth your own time as you consider your own generative AI projects.
1. Your generative AI project may be challenging for reasons that are entirely unrelated to generative AI.
There are already a good number of applications of Natural Language Processing (NLP) that transform speech into text, and this has long been used for transcriptions. Of course, this is largely used in the project AP is running with Michigan’s WUOM-FM radio to transcribe and further process city council meetings.
But the environmental factors of the city council meetings are a large variable to take into account in the project — not so much because transcription technology may not be up to the task, but rather because the underlying audio gathered in this kind of setting has proven to be very challenging. Painting a picture for me of what the room, mike set up, participants all may look like, Ernest noted that there was a kind of hard-edged limitation at play when it came to the audio recording.
Now, there are also various AI-powered tools that will clean up audio, and I’m sure spy agencies have some even better tools we know nothing about to reprocess weak audio. But somehow, they don’t license their tech. The reality is, in this day and age, we have a certain bar of audio quality to clear for speech-to-text to be an option.
2. Humans give automation a much higher bar of quality to clear than what they give human labour.
Ernest was telling me about the project with WFMZ-TV in Pennsylvania where an AI-powered app aims to classify incoming e-mails from the public to create a coverage calendar.
There is a training component to this project because, as Ernest noted, what is newsworthy is highly dependent on the specific news organisation using the feature. You can think of the approach as somewhat similar as training a spam filter.
Ernest noted there was a high bar of quality to clear for the newsroom team to be able to rely on the new system. If there’s as much labour in fixing or verifying the material, does it help anyone? And this is true in general for anything system automation replacing human labour: We’re less tolerant of approximation from machines in general (to wit: humans get into car accidents, but self-driving cars won’t take over our streets until their accident record is far closer to zero than ours would ever be).
This reminded me of a few projects I have worked on that involved algorithmic sorting — and how humans react when sorting is incorrect. One example of such a system is Perspective, the Google API that tries to score comment threads for their toxicity. News organisations that use this system decide at what score they may want to automatically block a comment and at what score they may want to automatically auto-publish a comment. In the middle, comments that have a score that’s neither too low nor too high are therefore left for human moderators to handle.
In such an approach, the automation has diminished the size of the original problem. Where humans used to have to read 100 comments, they may only now be reading 20. But this only works if the 80 comments that got automatically sorted in the “never publish” or “auto-publish” piles are correct. Where the algorithm makes mistakes, humans are far more unforgiving than they would of a fellow human blundering.
3. Making an automation project succeed in a smaller organisation depends on workflow questions as much as on the automation itself.
One area Ernest and his colleagues are paying attention to is how new automated tools may be able to fit in existing workflows — or if the teams that will receive these new tools are willing to make changes to their workflow so they can best use these tools.
This is a bit of a paradox of many a news organisation — whether the smaller ones with limited resources like the ones Ernest is working with or the large ones with lots of resources: Workflow is usually honed with the precision of a factory. While having strong processes usually creates clarity, we have to recognise automation almost always is a form of disruption into existing processes.
Even the process of providing algorithmic training to an AI-powered system that’s currently in its learning phase can be a disruption. And, of course, a data scientist would say, “but that’s really an investment so you can eventually eliminate certain tasks in the future.” But the resistance of humans to process changes can be very high.
Furthermore, as Ernest noted, we’re talking about folks who often have very long days and who, in the past few years, have had to absorb extra work as downsizing in their organisation consolidated several job functions into one role. Do these folks really have the brain space to cheerfully take in a period of disruption all in the name of some future, still-hypothetical improvement?
Ernest’s understanding of this predicament is helping him approach these projects with patience and compassion, but it is a lesson to consider for all organisations that are approaching AI-powered projects where training is going to come from internal staff as an add-on to their already full plates.
*
A common thread within these observations is that building internal tools — especially tools that are built on still-fledgling technology — has a large amount of people management, in addition to good product management and data science. The right project isn’t necessarily the one where the technology is the best fit but rather one where humans and technology can live in harmony.
Further afield on the wide, wide Web
A few good reads from the wider world of data. This week:
Big bytes
- From New York Magazine/The Verge, a deep dive into the world of the companies which furnish Silicon Valley with the (cheap) workforce that’s been labeling every piece of data under the sun for the purpose of feeding the models. (The article: “AI is a lot of work.”)
- I’ve been super into the question of how IP would play out for music and voice in the context of generative AI (see past coverage here and here). In “How actors are losing their voice to AI” (paywall), the FT looks at the example of voice actors whose voices are now available on thousands of audio products, even though these voice actors actually did licensed work years ago but gave full rights to the original licensee to derive whatever value from the original work. In the age of synthetic media, these voice actors are raising their voice [unclear whether this corny joke will survive the sharp knife of INMA’s editor. I’ll see myself out.]
- How’s your relationship with your legal department these days? Are they hounding you for some compliance work and reviewing your boilerplate language looking for data privacy violations? Are they concerned about just how compliant these new AI tools are going to be? Well, here’s something for you and for them: Stanford’s Center for Research on Foundation Models published an analysis of the current level of compliance of various large language models (including local favorites like OpenAI’s GPT-4 or Google’s PaLM2) against the current draft state of the EU’s AI act. It’s BigScience/HuggingFace that places first! But everyone is ways away from the goal posts …
Small bytes
- AI startups have a lot of cash, but scarce data (WSJ) — The question isn’t so much where could these venture money-soaked companies turn to license data (lots of companies may be offering), but rather how to do this in a manner that doesn’t compromise the IP.
- UK universities sign code of conduct for using generative AI (The Guardian) — Many news organisations are doing similar work, mostly in silo, but UK universities drew up shared principles and it’s always interesting to see how other industries approach some of the tools and problems we too may face. In this case, educating educators and student alike on AI is a big focus — rather than banning the tools.
- Mathias Döpfner, CEO of the German publishing house Axel Springer, spoke at the Cannes Lions Festival a couple of weeks ago about some of the opportunities, in particular the cost efficiencies, offered by generative AI. For Dopfner, we should look to the cost savings we will make in “layout, error correction, translation, editorial production” to reinvest in our core products. (The full speech, via YouTube)
- I always enjoy a good backlash to a backlash. I suppose this is a bit of a hot take, but a not insignificant part of EU privacy laws have motivations in matters of industrial competition rather than strictly data privacy. But it turns out the latest proposal from this corner, the EU’s AI Act, is not making Europeans very happy. In The Verge, over 150 large European companies say the EU’s current plans for AI regulation could end up preventing European companies from gaining an edge in technological competitiveness. Some of the signatories include folks like Cédric O, France’s former minister of state for digital affairs. An interesting development to be sure.
About this newsletter
Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud.
This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.