Record label battles with generative AI likely have lessons for news industry

By Ariane Bernard


New York City, Paris


Hi everyone. 

This week, I am back to torment INMA’s editor with pop music content. I really am trying hard to see if there are limits to her patience with my editorial choices. If I’m not back in two weeks, you know what happened.

All jokes aside, the roiling waves of legal news around generative AI are truly very interesting to look into, so I hope you’ll enjoy this look at how record labels are taking on generative AI and what this may mean for us from a different corner of the media industry.

Very interested to hear your thoughts on this, so don’t be a stranger:

All my best, Ariane

PS: Dont miss news of my latest report about generative AI, which was released today and is free to INMA members. 

Generative AI and the test of copyright law

A little musical moment for this newsletter. And this will be a temporary respite from my relentless Swiftie programming because this week, it’s about Drake and The Weeknd.

You see, there is a great lil’ bop “from” the two artists making the rounds on TikTok. Now, I am no music critic, but as the youth say and to use a technical term: It slaps. You could hear a sample of it on this tweet when I wrote this newsletter, but the copyright laws seem to have caught up to it just before it was published:

Between the writing of this newsletter and its publication, copyright laws also removed this platform for the AI generated song.
Between the writing of this newsletter and its publication, copyright laws also removed this platform for the AI generated song.

Where this gets interesting for our purposes: AI generated the song.

UMG, the record label behind Drake and The Weeknd, has not enjoyed the work to say the least, fighting on two fronts: working  to block streaming services from scraping the songs of its artists, as well as blocking the AI-generated tunes from being distributed (see more in this article from the Financial Times). 

It’s been fairly successful on that last goal. In the few days — between the release of the song and me submitting this article for editing — the song from TikTok creator Ghostwriter977 was successfully removed from YouTube, Spotify, the original TikTok. That said, you can still find it if you search for the name of the tune, “Heart on my Sleeve,” because the Internet never forgets.

I have just spent the past three months researching a report into generative AI. The legal and intellectual ramification of generative AI was one of the most interesting, but also complicated, parts of researching this report.

In the matter of blocking AI-generated tunes, for example, there is a question of authorship, which is rather complex to dissect: Neither the artists or composers of the songs on which the AI trained are the author of the AI’s synthetic work. And the financial interests who back these artists — UMG in this case — aren’t rights holders to the synthetic tunes either. 

Yet there is a question of what I could best call “filiation,” which really is not addressed in our current intellectual property concepts simply because there were no real grounds for it until now.

Until this age of Artificial Intelligence, someone was an author or they weren’t. If they used any form of non-human tools to aid in creation, these systems or their originators didn’t get credit (musical instrument makers don’t get credit, recording equipment companies don’t get credit, and the AI doesn’t have copyright either). And if someone inspired a piece of work, or even gave the idea for the work, they have no IP claim to it (ideas, famously, cannot be copyrighted or patented).

So this takes us to the question of how the scraped data on which the AI trained can be considered a part of the output — this filiation question. 

Does the training data lead to having copyrights into the output? That’s about the only place where there may be some actual enforceable claims, and lawsuits like the one Getty Images brought against Stable Diffusion for training on its licensed content is making the claim that AI training goes beyond Fair Use, even though there is a degree of transformation involved in the synthetic output.

Much of the output from AI-created sound does not fall under current copyright laws.
Much of the output from AI-created sound does not fall under current copyright laws.

The rest of the output, like the voice being used is, in the current state of the law,  not copyrightable. There are some provisions where the law blocks the use of a well-known voice generated by AI, but they have to do with how it is used rather than by virtue of the voice itself being copyrightable.

In 2020, Jay Z tried to block deep fakes mimicking his voice but found he couldn’t: YouTube reinstated the videos he tried to get banned

And lest you think the problem would go away for UMG if they were successful in preventing AI makers from scraping the production of their artists, it would probably only be successful at slowing down the AI’s learning. There are plenty of places for an AI to “learn” the voice of Drake which wouldn’t require using licensed material. Drake gives interviews, he speaks on his social media. The AI can learn hip hop from lots of different non-licensable places and can “learn” Drake’s voice from lots of different places where UMG has no copyright to defend. 

“ChatGPT and similar tools commit a highly sophisticated form of plagiarism,” said Jenna Burrell, the director of research at Data & Society, in an op-ed in Tech Policy Press. “The bigger concern is how ChatGPT concentrates wealth for its owners off of copyrighted work. It’s not clear if the current state of copyright law is up to the challenge of tools like it, which treat the Internet as a free source of training data.”

In the news media, we, too, have strong authors whose distinctive voices are part of the final product we put out. And voice may be “vocal,” but even writing style can be distinctive enough. As news organisations are often known for distinctive signatures, long-time readers can often recognise their creative styles. 

As far as audio, there are plenty of synthetic voices available to get your content out in automated ways, but audio branding is, of course, a very real thing — and not just the jingles in front of audio programming. 

As an example from the U.S., certain NPR programmes are instantly recognisable to frequent listeners just by the type of production they receive (Radiolab, my beloved). Voices, of course, are the strongest branded signature for the content itself in an audio-only environment.

Aftenposten in Norway trained an AI on the voice of its podcasters, so every article became available via audio in a voice that listeners already associated with its brand and products, their host Anne Lindholm. They could have used available synthetic voices in existence and called it a day, but the extra trouble was worth it.

Many news brands, like NPR's Radiolab, are instantly recognisable by frequent listeners, meaning their audio is part of the brand itself.
Many news brands, like NPR's Radiolab, are instantly recognisable by frequent listeners, meaning their audio is part of the brand itself.

Even without generative AI, delivery style or written style could be considered a certain kind of brand asset. In 2015, BuzzFeed created a Tom Friedman quote generator after The New York Times columnist (the Twitter bot is still there, though she hasn’t tweeted since 2016). This one is clearly parody, so protected speech under the First Amendment, and unlikely to devalue the brand asset of The New York Times. 

But if you are UMG and you signed Drake for what appears to be a record amount of cash, the synthetic tunes are potentially devaluing the asset you fought pretty hard to get. Flood the market with enough synthetic Drake songs … are the real ones worth less?

Not to mention that in the economy of streaming services, potential earnings are a zero sum game pegged as a proportion of streaming time against the fixed amount of royalty money per subscriber. In this respect, Synthetic Drake and Real Drake are fighting equally for streaming minutes.

What should be interesting for the news media is looking at how the broader world of content creators and their rights holders — whether that’s UMG or Hollywood studios — take on AI, because their assets usually have more individual longevity in the market than everyday news has.

In news, the totality of our catalogs (our archive) has value, but very few assets have long-standing value in individual distribution. Some of your articles continue to do well in search for years, but most articles are a flash-in-the-pan. We therefore tend to think of the value of our archive mostly through the prism of global B2B licensing deals rather than end-user single-item distribution.

The assets of a movie studio or record label’s catalog continue to have individual distribution value for much longer than news does, so there are really two battles up ahead for these media rights holders: picking up a fight with AI companies on the matter of the training data and with distributors of the synthetic material on the matter of infringement.

But the news media could, in the end, also face a similar challenge if an AI decided to create content in the “tone and voice” of a well-known publication — and do so day in and day out. At the moment, the topic of scraping is the one that occupies the news media more so than the distribution angle, but the fight of record labels may tell us something about our future, too.

Hot off the presses: My new report on generative AI

My latest INMA report, News Media at the Dawn of Generative AI, dropped this very morning. Read it. Share it. I’d love to hear what you think about all the opportunities and pitfalls in front of us.

Webinar next week

Our next Smart Data community Webinar is on Wednesday, April 26,  from 10-11 am (New York). Alex Held, senior data scientist at Der Spiegel, will speak about how his organisation is going about identifying valuable potential subscribers. 

Free to all INMA members, so please join us by signing up here.

Further afield on the wide, wide Web

A few good reads from the wider world of data this week: 

  • In the Columbia Journalism Review, three researchers share the findings of their investigation of the reality of filter bubbles as seen through the prism of the personalisation of searches on Facebook, Google News, Twitter, and YouTube. They asked 1,600 people to perform these searches (knowing their results would be personalised) and found the results of similar queries tended to homogenise their exposure to information — that is, break people out of filter bubbles toward a mainstream. Not what people imagine when they think of personalised news!
  • Andrew DeVigal, the chair in Journalism Innovation and Civic Engagement at University of Oregon School of Journalism, took a look on his Medium at “Her,” the 2014 Spike Jonze movie where the main character falls in love with an AI. Looking at the movie through the prism of what we may now see of the dawn of generative AI feels a bit different now — and made me want to watch the movie again. 
  • More newsy: The New York Times took a look at the work Google is doing to catch up in the AI chatbot race and bring more AI to search (gift link). 

About this newsletter

Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company,

This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at with thoughts, suggestions, and questions. Also, sign up to our Slack channel.

About Ariane Bernard

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.