We live in interesting times.

Recently, a white supremacist group chose to march near to my house as a show of “freedom of speech.” It was one of many that have erupted across America. As I saw the footage on Twitter, I couldn’t help but think of members of my family who, generations ago, fled pogroms in Ukraine and fascism in Europe.

We aren’t there yet, but there is unmistakably a rising tide. We can only hope it is stopped far short of its potential.

Media companies should consider their responsibility in protecting user data.
Media companies should consider their responsibility in protecting user data.

We are often told data is the new oil. Media organisations that wish to make the transition to the Internet — or start-ups that wish to launch themselves freshly upon it — need to gather information about their readers. This data will allow them to make more nuanced decisions about what to publish and how to deliver it. Machine-learning algorithms will use these enormous pools of data to recommend exactly the right articles for each reader, keeping them engaged for longer and driving up advertising revenues.

“Machine learning” sounds like a magical Artificial Intelligence technology from a science fiction movie, but the way it works is relatively simple.

First, an organisation builds up a huge bank of pre-categorised data. A machine-learning algorithm can then use that data, and those categories, to attempt to categorise new data.

For example, based on a set of existing content recommendations, a machine-learning algorithm could attempt to make new recommendations. Those new recommendations can be corrected by humans and are then added to the data set. The algorithm then becomes more accurate over time, because it has more and more categorised data to work from.

In media, this data includes the interests people have and what kinds of content they read. Often, this is linked to personally identifying information for each user.

Effectively, users being tracked across the Web have a giant library record attached to them: a data set of virtually everything they’ve read across participating Web sites. This record contains inferred interests — things the tracking algorithm has decided they’d be interested in — that they might never get to see or correct. And the economic incentives are to continue to make that data set bigger and bigger.

One hallmark of authoritarian regimes in the 20th century was their use of libraries. Library records would often be reviewed to try and identify dissidents. People who read books considered to be against the interests of the state might be questioned. And people were targeted for discrimination (and much, much worse) by their ethnicity, sexuality, and political preferences. This practice is significantly easier in the 21st century.

We know targeting can be used to identify users’ sexuality and ethnicity based on what they read. Last year, it was discovered Facebook assigns users an “ethnic affinity” for non-white users. Although placing ads to target or exclude races based on this was banned this year, the data continues to exist and could potentially be used for other purposes.

Although gathering vast sets of user data may increase engagement, we need to consider what the long-term implications of identifying reader preferences might be.

News is a particularly dangerous kind of information — it empowers citizens by giving them an understanding of what’s happening in the world around them. The kind of news we choose to consume as individuals can reveal what we care about, what our preferences are, and how we want to see the world. Our reading habits can single us out.

From May 2018, the European Union’s General Data Protection Regulation will require online services (including news Web sites) to ask for their users’ permission before they begin tracking and storing user data.

This is an admirable goal, but in a world where the shadow of fascism is beginning to encroach upon our seats of government, and where the President of the United States calls the news media the enemy of the American people, we should ask ourselves if we should we be gathering this data at all? Or does this violate the duty of care news organisations have to their readers?

In my role at Matter, I’m looking for innovative media start-ups that help create an informed, inclusive, and empathetic society. As part of this, I’m interested in start-ups that are building new business models for media, which don’t rely on user tracking or profiling. If that’s you — or if you care about user privacy in online media — I would love to talk.