We like to think of ourselves as progressive and egalitarian here in the Nordics, hovering consistently near the top of the Gini index and, in Norway, clocking in second in last year’s Global Gender Gap report (after Iceland) from the World Economic Forum. In honour of the recent International Women’s Day on March 8, Norway’s largest local publisher, Amedia, took a hard look at how gender plays out in our journalism.

We write far more about men

How do gender roles express themselves in Amedia’s journalism — in the people we talk to and in the people and events we cover? In short, how does gender play out in the close to 2,000 stories we write every day across 72 local and one national Norwegian newspaper?

As it turns out, with a female/male 34/66 split found in our stories, we’re nowhere near an ideal 50/50 split (or even the 49.6/50.4 actual split).

Why? At this point, we can only speculate. But as we’ll see, some of the topics we cover probably have a far higher likelihood of having men in official positions. As such, this could be seen as an indication of the real gender gap in Norwegian society.

Nonetheless, our own actions as editors and journalists — our biases and choices of who we talk to — is certainly part of the picture. We can, and should, do better. And getting a firm diagnosis in place is the first step to actually doing something about it.

So how, with currently more than 2,000 stories written every single day year round across 73 newspapers, do we go about determining how well we’re actually doing?

First, we need to diagnosis our own coverage. And, secondly, we need to create tools enabling our journalists to make more informed choices.

Data science and automatic classification

Believe it or not, we actually used to do the first part manually. As part of many of our newspapers’ editorial annual reports, we actually sat down and counted the number of men and women mentioned in our stories. It was a back-breaking effort with significant potential for error. It was also a model that clearly did not scale, so we stopped counting and instead focused on other measures of how well we were doing.

Enter the era of data and data science, particularly in the area of natural language processing (NLP). Long story short: We’re now able to automate the process, for every single story written.

First, we identify actual persons mentioned in the text using a process where we identify so-called entities referenced in our stories. That is, we identify people, businesses, organisations, and similar entities. We’re able to differentiate between all businesses with “Nina” in their names. For example, there is the storm “Nina” battering Norway’s West Coast and then there are actual people named “Nina.”

At the moment, we don’t recognise names that are actual sources, so a name might be listed in a table (e.g. in sport results) rather than being an interviewee, for instance. But for our purposes in setting a diagnosis, the data should be precise enough.

In the second part of the process, we match recognised names with Statistics Norway’s public database of male and female names.

Finally, with this data firmly in place in our centralised data repository, we’re able to run analyses on all aspects of data. Of particular interest here are content categories, which is another result of our NLP operation. Currently, we classify all stories across 20 main categories and a number of sub-categories corresponding roughly to the IPTC’s Media Content vocabulary with some local tweaks.

21 months, 660,000 stories

This is what 21 months of recognised names across 660,000 stories in 64 of our local newspapers looks like. As you can see, the average is a fairly consistent 34/66 split between women and men.

But this seemingly consistent picture actually hides significant variation.

Smaller newspapers more gender-balanced

The graph below shows the share of female names across all 64 newspapers in the data set.

The top performer has a 42% share of named female names in its stories while the newspaper on the other end of the list has a 28% share. Again, these are average figures across 21 months.

Seemingly the everyday choices we do make actually matter.

A striking factor is that our smaller newspapers skew toward the left side of the graph (with better gender representation), while our larger, regional newspapers are more often than not found on the right side.

This could indicate that, given the latter often work on regional and sometimes even national topics, the authority figures they get to interview are more often men. That could be as much a factor as the choices our journalists make.

In short, the answer is not clear cut.

Gender imbalance varies with the topic

A final, striking difference becomes apparent when we map gender across story topics.

Some of these topics have few stories behind them — for instance, fortunately, there is little war coverage in local, Norwegian news (a mere 900 stories) — but other categories have a significant amount of production behind them. Education, which is made up of 47% female names, and health, which has 48%, are both represented by a healthy 30,000 stories each.

Still, the separation is striking: In disasters, emergencies, and accidents (37,000 stories), a mere 21% of all names are female. Presumably, police and authority figures in this category skew male, although those affected in all likelihood do not.

And in sports, the results are much the same. Sports weigh heavily on the Norwegian football leagues — with a clear overabundance of interest in male football — and there were a whopping 130,000 stories during this period.

But what about the only topic where female names were in the clear majority, an area we dubbed “society?” It covers such “soft” topics as communities, demographics, families, discrimination, and welfare.

Perhaps the authority figures in the social sector skew female, but perhaps our own biases also lead us to interview more mothers than fathers? If true, this in itself indicates a gender imbalance worthy of a closer look.

From diagnosis to action

With a preliminary diagnosis in place, where do we go from here?

We certainly need to go into more detail, and as we get more answers, new questions obviously open up.

One of the most intriguing pieces of data to dig into is gender representation in our texts as measured against gender representation in our readership — not to mention gender representation within our subscriber base. In short, our hypothesis is that a better gender representation makes sound business sense.

With a 75% log-in rate across our local titles — and corresponding hard data on who’s reading what — we’ve seen clear indications this has been the case for quite some time. Indeed, some of our editorial teams utilise a specially developed dashboard to inform themselves on the gender gap in our readership.

The following graph shows how the share of female subscribers (i.e. where a female reader is tied to a subscription) correlates with the amount of female names in our stories across 19 of our newspapers in the same 21-month period.

The graph shows a clear correlation between the two. That is, those newspapers with more stores containing female names have a higher female readership. To put it mildly, that is interesting.

The next step would be to look at the financial performance of the same newspapers over time and calculate the business impact of balancing gender.

There are obviously other data points that would be great to dig into. For instance, quantitative figures of gender in local authority roles — from municipal matters and politics to sports — matched with our representation of the same topics would certainly reveal some interesting patterns.

One obvious goal is to create a feedback loop to our 850-odd journalists and editors all over Norway to offer them running data on gender imbalance. The gender dashboard represents the first iteration. However, the ultimate goal is to deliver better journalism for the local communities we publish in. This necessarily entails better representation of the 49.6% of females living there.