Sacramento Bee, Stanford University wade into Big Data waters
Innovative Advertising Solutions Blog | 23 July 2013

Many of us have been discussing how we might move into the era of Big Data and develop data sets with actionable means that support an improved reader experience and better results for our advertisers.
For more than a year, we have been having conversations about the need and benefits that come from having a better understanding of our audience.
We know we need to move beyond counting page views and monthly, daily, or local uniques as a measure of success.
We need to get to where we know who our digital customers are, where they come from, and what they value most when visiting us.
We need to offer a better and more personal user-experience to our visitors and, in turn, we need to know our visitors more personally to better serve them and our advertisers.
We wondered: If we gained these insights, could we then improve the reader’s experience with us by making the content and the ads more relevant to each reader with each visit?
In other words, could we move a newspaper into the world of Big Data and algorithms?
Through some connections established by Tom Negrete, then The Sacramento Bee’s managing editor and now the newspaper’s director of innovation, we reached out to Stanford University to see if experts there might help.
Ann Grimes, director of the graduate journalism programme, liked our proposal. With her help, two teams of graduate computer science and journalism students were launched to develop open-sourced tools that were presented to us in early June.
During the past semester, Negrete and Sean McMahon, director of digital media, led the charge in working with the student teams, which tackled the issue of trying to get to know and better engage our online visitors.
A common goal of both teams focused on improving our visitors’ Web site experience. The student teams did field research, visiting some large newspapers, news organisations, and a variety of U.S. and European vendors.
One team of students worked at developing a different kind of commenting system that they called “Yes&.”
The prototype enables a visitor to make comments and see other visitor comments and volume on a specific paragraph while reading an article, versus only at the end of an article. The hope is to improve the commenting experience for increased engagement and time on site.

The second team created a system to connect metadata in news content consumption to individual users’ interests and reading habits. Each article read is scanned using natural language databases, capturing user interests based on reading and sharing of content.
If Web site visitors opt-in to enable the process, they would passively build their own content profiles based on what they read on the Web site over time. Eventually, a consistent visitor would have a consumption profile large enough to be studied and segmented based on interests.
The algorithm API enables ties into third-party systems that may be used to aggregate user segment groups for added content marketing.
As individual visitors build their own consumption profiles, by visiting content and clicking on ads, the concept is to offer related or topical content for visitors to consider viewing through a content recommendation engine. This would bring information to their attention that they might otherwise miss.
The desired results would be more clicking on content of interest, more time on site, more ads served, and a better user experience overall.
Add a single sign-on to the equation and we get more information on the visitor. And the visitor gets better content recommendations based on what he or she is interested in reading.
On the advertising side, in theory it would be possible to serve impressions against known visitor characteristics, such as visitor interest segments or demographics. However, this would require large enough volumes in our visitor interest segments to achieve successful scale in ad serving.
At the very least, the process would still supply us with more valuable information about our visitors that could enable a different kind of conversation with advertisers about the quality and value of our audience.
Both ideas were born of the goal to improve the visitor experience. That begins with an understanding of who they are and what they value in us.
European newspapers are ahead in this space, but American publishers are now realising the opportunities and value that such data can provide.
Next steps: Some of us will be engaged in testing and working with this open-source code, as well as continuing discussions with other solutions vendors and companies with value-building data processes for our visitors.
Delivering on a great user experience increases the likelihood that we will attract opt-in visitors who will gladly participate, given the value they receive in return.
To realise such opportunity will still require hard work on issues like transparency, consumer perception, and adoption, as well as agreed-upon implementation. This process has shown the opportunities of getting valuable information into a structure for analysis and people who can do the analysis.
Whether or not either project is fully implemented, the exercise was a positive learning experience and a worthwhile effort. We are generating great conversations, stimulating broader thinking, and pushing ourselves harder to be better.
Hard work, but all good.