Helena Bengtsson, editor of digital projects at the Guardian newspaper, does not like the term Big Data. “I don’t talk about Big Data,” she told delegates at INMA’s Big Data for Media conference at Google London on Friday. “I do ‘large data’ for journalism. Big Data is complex, and you can’t process it using traditional tools.
Bengtsson gave a number of examples of how Big, large and small data have been used in journalism in stories at the Guardian and worldwide.
The first, Reading the Riots, the product of a collaboration between the Guardian and the London School of Economics, was based on data acquired by analysing 2.5 million tweets during the 2011 London riots.
She also mentioned the Centre for Public Integrity’s data journalism project, Cracking the Codes. Based on data gathered from 84 million Medicare claims in the United States, it revealed that medical providers were getting extra Medicare fees by exaggerating medical claims.
Bengtsson then discussed a project based on Big Data conducted by Japanese Broadcaster, NHK.
This consisted of a series of documentaries based on “disaster data” around the 2011 Japanese earthquake/tsunami. The broadcaster, NHK, analysed reconstruction and recovery efforts using Big Data, including demographic trends based on mobile phone signals, which showed where people were living after the event.
The data journalists collected and analysed information from 750,000 company computers, revealing that 20,000 business connections were lost after the earthquake. They also studied movement of traffic in the period after the disaster, using signals from car satellite navigation systems.
Bengtsson said that, although it was an example of excellent data journalism, NHK was able to use data that journalists would not normally be able to access, she said.
The data in the WikiLeaks Iraq war logs were described by Bengtsson as “the most exciting database I have ever worked with.”
“We analysed it using traditional and non-traditional methods,” she said. “One of the reasons I love data journalism is that it helps me to pick that needle out of the haystack. It is about finding the story, finding the detail, more than finding the trends.
“We could have found more stories from the WikiLeaks data if we had had some of the tools we have now,” Bengtsson said.
Asked by an audience member for advice on how to persuade reporters not to be afraid of data, Bengtsson answered: “I don't know why journalists think it is too difficult for them. I find it baffling that journalists can take on the complexities of stories, yet when you try to teach them to understand an Excel five document, they panic.”
But, she said, as data journalism becomes more widely practised, the better journalists will get at doing it. “We just need stories, stories, stories,” Bengtsson said.