Enhancing existing data value with data fusion
Media Research Blog | 17 January 2017
One thing our diverse client base almost universally agrees upon is a desire for us to provide more data within our syndicated studies. At the same time, our clients are often reluctant to support the costs associated with collecting new data.
Collecting quality data is not an inexpensive endeavor, so the question we needed to answer was: How can we provide more data at a lower cost to the end users?

One solution we have embraced at Nielsen Scarborough is the use of data fusion techniques to bring data from other reliable sources into our syndicated data sets.
Simply stated, data fusion is a method of integrating data by doing respondent-level matching of data sets. Respondents from one survey are paired with respondents from another survey.
The matching uses the common characteristics of the two surveys — usually demographic characteristics — along with other relevant information, such as media usage. The underlying principle of data fusion is that these common characteristics can reliably predict consumer behaviour.
At Nielsen Scarborough, we currently provide two fused data sets through our partnerships with other quality data providers. One involves attitudinal information and the other focuses on health-care information.
We have partnered with GfK MRI to include 13 categories of attitudinal information, already collected by GfK MRI, within our syndicated data sets. While we collect large amounts of data about the who, what, and where of consumer behaviour, the GfK MRI attitudinal data provides insight into why consumers do what they do.
Some of the attitudinal categories include buying styles, food, finance, technology, and, of course, advertising. The GfK MRI attitudinal data is merged with syndicated Nielsen Scarborough data at the respondent level, so the two data sets can be freely crossed, combined, and analysed.
This allows users to better understand the key motivators that drive behaviour without having to incur the costs of collecting that data.
The health-care module is the result of a partnership with Kantar and utilises its MARS Consumer Health Study. The MARS study provides an extensive data set of healthcare insights, including actions that consumers take as a result of exposure to healthcare advertising; diagnosis and treatment of health conditions; product usage for Rx and over-the-counter remedies; diet and nutrition; and much more.
As with the attitudinal information, the MARS healthcare data is merged with syndicated Nielsen Scarborough data at the respondent level, allowing for easy analysis of information from both data sets. This gives our clients access to an extensive set of reliable health-care information without having to collect any new data.
While data fusion may sound simple, it can be, in fact, quite complicated. A quality data fusion utilises many demographic and other variables to create an accurate picture of the consumer across all the common characteristics of the linked surveys.
The key to a reliable data fusion is choosing the best characteristics used to link the two surveys, often referred to as “linking variables” or “hooks.” The reliability of a data fusion is related to how well the linking variables explain consumer behavior, and the level of reliability can be assessed by statistical techniques.
Clearly, some aspects of consumer behaviour are more predictable than others. The purchase of baby products is directly correlated with the presence of zero- to 2-year-old children in the household, for example.
Some may argue that for products such as diapers or baby food, a simple analysis by the presence of zero- to 2-year-old children is sufficient and there is no need for data fusion. However, data fusion can provide a deeper understanding of this segment by providing insights into heavy users versus light users, brand preference, and the impact of advertising on evaluation and choice.
Like any modeling technique, data fusion requires validation. The only truly direct validation of a data fusion is to have a perfect single-source data set as a benchmark, but therein lies the rub. If a perfect single source data set existed, there would be no need to produce a data fusion.
That said, there are several ways of assessing the reliability of a data fusion. Checks and balances are applied to the fusion process itself to ensure the various elements of the fusion are optimised.
What are the best linking variables to use, and how well do they predict behaviour? Are the linking variables matched well across the two data sets, and how closely are they matched?
With this in mind, we include questions within our surveys that are carefully designed to provide effective “hooks.” The reliability of the fused data sets are then evaluated with statistical tests and custom analyses.
Data fusion is not the solution to all data collection challenges. Clearly there are situations where new data must be collected to provide the answers that researchers seek. However, in situations where the resources for new data collection are not readily available, data fusion can provide the additional information we need, in a reliable way, at a significantly lower cost.